<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/" xmlns:indexing="urn:atom-extension:indexing" indexing:index="no"><access:restriction xmlns:access="http://www.bloglines.com/about/specs/fac-1.0" relationship="deny"/>
  <title>Planet Topic Maps</title>
  <updated>2012-05-17T12:17:59Z</updated>
  <generator uri="http://intertwingly.net/code/venus/">Venus</generator>
  <author>
    <name>Arnar Lundesgaard</name>
    <email>arnar.lundesgaard@bouvet.no</email>
  </author>
  <id>http://planet.topicmaps.org/atom.xml</id>
  <link href="http://planet.topicmaps.org/atom.xml" rel="self" type="application/atom+xml"/>
  <link href="http://planet.topicmaps.org/" rel="alternate"/>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25282</id>
    <link href="http://tm.durusau.net/?p=25282" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25282#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;amp;p=25282" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Google Advertises Topic Maps – Breaking News – Please ReTweet</title>
    <summary xml:lang="en">Actually the post is titled: Introducing the Knowledge Graph: things, not strings. It reads in part: Search is a lot about discovery—the basic human need to learn and broaden your horizons. But searching still requires a lot of hard work by you, the user. So today I’m really excited to launch the Knowledge Graph, which [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p>Actually the post is titled: <a href="http://googleblog.blogspot.com.au/2012/05/introducing-knowledge-graph-things-not.html">Introducing the Knowledge Graph: things, not strings</a>.</p>
<p>It reads in part:</p>
<blockquote><p>Search is a lot about discovery—the basic human need to learn and broaden your horizons. But searching still requires a lot of hard work by you, the user. So today I’m really excited to launch the Knowledge Graph, which will help you discover new information quickly and easily.</p>
<p>Take a query like [taj mahal]. For more than four decades, search has essentially been about matching keywords to queries. To a search engine the words [taj mahal] have been just that—two words.</p>
<p>But we all know that [taj mahal] has a much richer meaning. You might think of one of the world’s most beautiful monuments, or a Grammy Award-winning musician, or possibly even a casino in Atlantic City, NJ. Or, depending on when you last ate, the nearest Indian restaurant. It’s why we’ve been working on an intelligent model—in geek-speak, a “graph”—that understands real-world entities and their relationships to one another: things, not strings.</p>
<p>The Knowledge Graph enables you to search for things, people or places that Google knows about—landmarks, celebrities, cities, sports teams, buildings, geographical features, movies, celestial objects, works of art and more—and instantly get information that’s relevant to your query. This is a critical first step towards building the next generation of search, which taps into the collective intelligence of the web and understands the world a bit more like people do.</p>
<p>Google’s Knowledge Graph isn’t just rooted in public sources such as Freebase, Wikipedia and the CIA World Factbook. It’s also augmented at a much larger scale—because we’re focused on comprehensive breadth and depth. It currently contains more than 500 million objects, as well as more than 3.5 billion facts about and relationships between these different objects. And it’s tuned based on what people search for, and what we find out on the web.</p></blockquote>
<p>Google just set the bar for search/information appliances, including topic maps. </p>
<p>What is the value add of your appliance when compared to Google? </p>
<p>When people ask me to explain topic maps now I can say: </p>
<p>You know Google’s Knowledge Graph? It’s like that but customized to your interests and data.</p>
<p>(I would just leave it at that. Let them start imagining what they want to do beyond the reach of Google. In their “dark data.”)</p>
<p>Who knew? Google advertising for topic maps. Without any click-through. Amazing. </p></div>
    </content>
    <updated>2012-05-16T20:50:27Z</updated>
    <published>2012-05-16T20:50:27Z</published>
    <category scheme="http://tm.durusau.net" term="Google Knowledge Graph"/>
    <category scheme="http://tm.durusau.net" term="Marketing"/>
    <category scheme="http://tm.durusau.net" term="Topic Maps"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-16T20:50:27Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25279</id>
    <link href="http://tm.durusau.net/?p=25279" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25279#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;amp;p=25279" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Mobilizing Knowledge Networks for Development</title>
    <summary xml:lang="en">Mobilizing Knowledge Networks for Development June 19—20, 2012 The World Bank Group 1818 H Street NW, Washington DC 20433 From the webpage: The goal of the workshop is to explore ways to become better providers and connectors of knowledge in a world where the sources of knowledge are increasingly diverse and disbursed. At the World [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://web.worldbank.org/WBSITE/EXTERNAL/PROJECTS/0,,contentMDK:23189356~pagePK:41367~piPK:51533~theSitePK:40941,00.html">Mobilizing Knowledge Networks for Development</a></p>
<p><strong>June 19—20, 2012<br/>
The World Bank Group<br/>
1818 H Street NW, Washington DC 20433</strong></p>
<p>From the webpage:</p>
<blockquote><p>The goal of the workshop is to explore ways to become better providers and connectors of knowledge in a world where the sources of knowledge are increasingly diverse and disbursed. At the World Bank, for example, we are seeking ways to connect with new centers of research, emerging communities of practice, and tap the practical experience of development organizations and the policy makers in rapidly developing economies. Our goal is to find better ways to connect those that have the development knowledge with those that need it, when they need it.</p>
<p>We are also seeking to engage research communities and civil society organizations through an Open Development initiative that makes data and publications freely available. We understand that many other organizations are exploring similar initiatives. The Conference and Knowledge fair will provide an opportunity for knowledge organizations working in development to learn from one another about their knowledge services, practices, and successes and challenges in providing these services.</p></blockquote>
<p>You can register to attend in person or over the Internet. </p>
<p>As always, networking opportunities are what you make of them. This will be a good opportunity to spread the good news about topic maps. </p></div>
    </content>
    <updated>2012-05-16T20:35:50Z</updated>
    <published>2012-05-16T20:35:50Z</published>
    <category scheme="http://tm.durusau.net" term="Conferences"/>
    <category scheme="http://tm.durusau.net" term="Marketing"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-16T20:50:27Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25275</id>
    <link href="http://tm.durusau.net/?p=25275" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25275#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;amp;p=25275" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">From the Bin Laden Letters: Mapping OBL’s Reach into Yemen</title>
    <summary xml:lang="en">From the Bin Laden Letters: Mapping OBL’s Reach into Yemen I puzzled over this headline. A close friend refers to President Obama as “OB1″ so I had a moment of confusion when reading the headline. Didn’t make sense for Bin Laden’s letters to map President Obama’s reach into Yemen. With some diplomatic cables and White [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://analysisintelligence.com/osint/aqap-leadership-social-network/">From the Bin Laden Letters: Mapping OBL’s Reach into Yemen</a></p>
<p>I puzzled over this headline. A close friend refers to President Obama as “OB1″ so I had a moment of confusion when reading the headline. Didn’t make sense for Bin Laden’s letters to map President Obama’s reach into Yemen.</p>
<p>With some diplomatic cables and White House internal documents, that would be an interesting visualization as well.</p>
<p>The mining of a larger corpus of 70,000+ public sources for individuals mentioned in the Ben Laden letters is responsible for the visualizations.</p>
<p>What we don’t know is what means of analysis produced the visualizations in question. </p>
<p>Some process was used to reduce redundant references to the same actors, events and relationships. Just by way of example.</p>
<p>That isn’t a complaint, simply an observation. It isn’t possible to evaluate the techniques used to obtain the results.</p>
<p>It would be interesting to see <a href="https://www.recordedfuture.com">Recorded Future</a> in one of the <a href="http://trec.nist.gov/">TREC</a> competitions. At least then the results would be against a shared data set. </p>
<p>Do be aware that when the text says “open source,” what is meant is “<a href="http://en.wikipedia.org/wiki/Open-source_intelligence">open source intelligence</a>.” </p>
<p>The better practice would be to say “open source intelligence or (OSINT)” and not “open source,” the latter having a well recognized meaning in the software community. </p></div>
    </content>
    <updated>2012-05-16T20:25:45Z</updated>
    <published>2012-05-16T20:25:45Z</published>
    <category scheme="http://tm.durusau.net" term="Intelligence"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-16T20:50:27Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25272</id>
    <link href="http://tm.durusau.net/?p=25272" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25272#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;amp;p=25272" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Need cash? NLnet advances open source technology by funding new projects</title>
    <summary xml:lang="en">Need cash? NLnet advances open source technology by funding new projects Next Round of Ideas Due: June 1st 2012. Lead story at OpenSource.com today. From the story: If you have a valuable idea or project that can help create a more open global information society, and are looking for financial means to make your ideas [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://opensource.com/law/12/5/nlnet-advances-open-source-technology">Need cash? NLnet advances open source technology by funding new projects </a></p>
<p><strong>Next Round of Ideas Due: June 1st 2012.</strong></p>
<p>Lead story at <a href="http://opensource.com">OpenSource.com</a> today.</p>
<p>From the story:</p>
<blockquote><p>If you have a valuable idea or project that can help create a more open global information society, and are looking for financial means to make your ideas come through, we might be able to help you. Indeed our mission is to fund open source projects and individuals to improve important and strategic networking technologies for the better of mankind. Whether this concerns more robust internet technologies and standards, privacy enhancing technologies or open document formats – we are open for your proposals.</p>
<p>We are independent. We are not like other funding bodies you may have experience with, because we only have to judge on quality and relevance, and not on politics or any other dimension. What is important for us is that the technology you develop and promote is usable for others and has real impact. And we are also interested to hear your inspiring ideas if you are unable to manage it yourself.</p>
<p>We spend our money in supporting strategic initiatives that contribute to an open information society, especially where these are aimed at development and dissemination of open standards and network related technology.</p></blockquote>
<p>More details in the story or at the <a href="http://www.nlnet.nl/">NLnet website</a>.</p>
<p>What’s your great idea? </p></div>
    </content>
    <updated>2012-05-16T18:52:18Z</updated>
    <published>2012-05-16T18:52:18Z</published>
    <category scheme="http://tm.durusau.net" term="Funding"/>
    <category scheme="http://tm.durusau.net" term="Open Source"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-16T20:50:27Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25269</id>
    <link href="http://tm.durusau.net/?p=25269" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25269#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;amp;p=25269" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">OpenSource.com</title>
    <summary xml:lang="en">OpenSource.com Not sure how I got to OpenSource.com but it showed up as a browser tab after a crash. Maybe it is a new feature and not a bug. Thought I would take the opportunity to point it out (and record it here) as a source of projects and news from the open source community. [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://opensource.com">OpenSource.com</a></p>
<p>Not sure how I got to <a href="http://opensource.com">OpenSource.com</a> but it showed up as a browser tab after a crash. Maybe it is a new feature and not a bug.</p>
<p>Thought I would take the opportunity to point it out (and record it here) as a source of projects and news from the open source community. </p>
<p>Not to mention data sets, source code, marketing opportunities, etc.</p></div>
    </content>
    <updated>2012-05-16T18:30:35Z</updated>
    <published>2012-05-16T18:30:35Z</published>
    <category scheme="http://tm.durusau.net" term="Open Data"/>
    <category scheme="http://tm.durusau.net" term="Open Source"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-16T20:50:27Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25262</id>
    <link href="http://tm.durusau.net/?p=25262" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25262#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;amp;p=25262" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Identifying And Weighting Integration Hypotheses On Open Data Platforms</title>
    <summary xml:lang="en">Identifying And Weighting Integration Hypotheses On Open Data Platforms by Julian Eberius, Katrin Braunschweig, Maik Thiele, and Wolfgang Lehner. Abstract: Open data platforms such as data.gov or opendata.socrata.com provide a huge amount of valuable information. Their free-for-all nature, the lack of publishing standards and the multitude of domains and authors represented on these platforms lead [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://arxiv.org/abs/1205.2465">Identifying And Weighting Integration Hypotheses On Open Data Platforms</a> by Julian Eberius, Katrin Braunschweig, Maik Thiele, and Wolfgang Lehner.</p>
<p>Abstract:</p>
<blockquote><p>Open data platforms such as <a href="http://data.gov">data.gov</a> or <a href="http://tm.durusau.net/opendata.socrata.com">opendata.socrata.com</a> provide a huge amount of valuable information. Their free-for-all nature, the lack of publishing standards and the multitude of domains and authors represented on these platforms lead to new integration and standardization problems. At the same time, crowd-based data integration techniques are emerging as new way of dealing with these problems. However, these methods still require input in form of specific questions or tasks that can be passed to the crowd. This paper discusses integration problems on Open Data Platforms, and proposes a method for identifying and ranking integration hypotheses in this context. We will evaluate our findings by conducting a comprehensive evaluation using on one of the largest Open Data platforms. </p></blockquote>
<p>This is interesting work on Open Data platforms but it is marred by claims such as:</p>
<blockquote><p>Open Data Platforms have some unique integration problems that do not appear in classical integration scenarios and which can only be identi�ed using a global view on the level of datasets. These problems include partial- or duplicated datasets, partitioned datasets, versioned datasets and others, which will be described in detail in Section 4.</p></blockquote>
<p>Really?</p>
<p>Would come as a surprise to the <a href="http://www.gaw-wdca.org/">World Data Centre for Aerosols</a> which had <a href="ftp://ftp-ccu.jrc.it/pub/WDCA/docs/SINGADS_final_report.doc">Synthesis and INtegration of Global Aerosol Data Sets. Contract No. ENV4-CT98-0780 (DG 12 –EHKN)</a> produced on data sets from 1999 to 2001. One of the specific issues they addressed were <strong>duplicate data sets</strong>.</p>
<p>More than a decade ago counts for a “classical integration scenario” I think.</p>
<p>Another quibble. Cited sources do not support the text.</p>
<blockquote><p>New forms of data management such as dataspaces and pay-as-you-go data integration [2, 6] are a hot topic in database research. They are strongly related to Open Data Platforms in that they assume large sets of heterogeneous data sources lacking a global or mediated schemata, which still should be queried uniformly.</p>
<p>…</p>
<p>2 M. Franklin, A. Halevy, and D. Maier. From databases to dataspaces: a new abstraction for information management. SIGMOD Rec., 34:27{33, December 2005.</p>
<p>…</p>
<p>6 J. Madhavan, S. R. Je�ery, S. Cohen, X. . Dong, D. Ko, C. Yu, A. Halevy, and G. Inc. Web-scale Data Integration: You Can Only A�fford to Pay As You Go. In Proc. of CIDR-07, 2007.
</p></blockquote>
<p>Articles written seven (7) and five (5) years ago, do not justify a “<strong>hot topic(s) in database research.</strong>” claim today.</p>
<p>There are other issues, major and minor but for all that, this is important work. </p>
<p>I want to see reports that do justice to its importance.</p></div>
    </content>
    <updated>2012-05-16T17:58:27Z</updated>
    <published>2012-05-16T17:58:27Z</published>
    <category scheme="http://tm.durusau.net" term="Crowd Sourcing"/>
    <category scheme="http://tm.durusau.net" term="Data Integration"/>
    <category scheme="http://tm.durusau.net" term="Integration"/>
    <category scheme="http://tm.durusau.net" term="Open Data"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-16T20:50:27Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25256</id>
    <link href="http://tm.durusau.net/?p=25256" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25256#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;amp;p=25256" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Modeling vs Mining?</title>
    <summary xml:lang="en">Steve Miller writes in Politics of Data Models and Mining: I recently came across an interesting thread, “Is data mining still a sin against the norms of econometrics?”, from the Advanced Business Analytics LinkedIn Discussion Group. The point of departure for the dialog is a paper entitled “Three attitudes towards data mining”, written by couple [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p>Steve Miller writes in <a href="http://www.information-management.com/blogs/data-model-mining-DM-politics-Breiman-Schrodt-10022472-1.html">Politics of Data Models and Mining</a>:</p>
<blockquote><p>I recently came across an interesting thread, “<a href="http://www.linkedin.com/groups/Is-data-mining-still-sin-35222.S.111125637?qid=655a27ec-119f-4558-a420-9ebf2d2f9266&amp;trk=group_most_popular-0-b-ttl&amp;goback=.gmr_35222.gmp_35222">Is data mining still a sin against the norms of econometrics?</a>”, from the Advanced Business Analytics LinkedIn Discussion Group. The point of departure for the dialog is a paper entitled “<a href="http://public.econ.duke.edu/~kdh9/Source%20Materials/Research/Three%20Attitudes.pdf">Three attitudes towards data mining</a>”, written by couple of academic econometricians.</p>
<p>The data mining “attitudes” range from the extremes that DM techniques are to be avoided like the plague, to one where “data mining is essential and that the only hope that we have of using econometrics to uncover true economic relationships is to be found in the intelligent mining of data.” The authors note that machine learning phobia is currently the norm in economics research.</p>
<p>Why is this? “Data mining is considered reprehensible largely because the world is full of accidental correlations, so that what a search turns up is thought to be more a reflection of what we want to find than what is true about the world.” In contrast, “Econometrics is regarded as hypothesis testing. Only a well specified model should be estimated and if it fails to support the hypothesis, it fails; and the economist should not search for a better specification.”</p>
<p>In other words, econometrics focuses on explanation, expecting its practitioners to generate hypotheses for testing with regression models. ML, on the other hand, obsesses on discovery and prediction, often content to let the data talk directly, without the distraction of “theory.” Just as bad, the results of black-box ML might not be readily interpretable for tests of economic hypotheses.</p></blockquote>
<p>Watching other communities fight over odd questions is always more enjoyable than serious disputes of grave concern in our own. (See <a href="http://tm.durusau.net/?p=25233">Using “Punning” to Answer httpRange-14</a> for example.)</p>
<p>I mention the economist’s dispute, not simply to make jests at the expense of “econometricians.” (Do topic map supporters need a difficult name? TopicMapologists? Too short.)</p>
<p>The economist’s debate is missing an understanding that modeling requires some knowledge of the domain (mining whether formal or informal) and mining requires some idea of an output (models whether spoken or unspoken). A failing that is all too common across modeling/mining domains. </p>
<p>To put it another way: </p>
<p>We never stumble upon data that is “untouched by human hands.”</p>
<p>We never build models without knowledge of the data we are modeling.</p>
<p>The relevant question is: Does the model or data mining provide a useful result? </p>
<p>(Typically measured by your client’s joy or sorrow over your results.)</p></div>
    </content>
    <updated>2012-05-16T17:07:59Z</updated>
    <published>2012-05-16T17:07:59Z</published>
    <category scheme="http://tm.durusau.net" term="Data Mining"/>
    <category scheme="http://tm.durusau.net" term="Data Models"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-16T20:50:27Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25252</id>
    <link href="http://tm.durusau.net/?p=25252" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25252#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;amp;p=25252" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Progressive NoSQL Tutorials</title>
    <summary xml:lang="en">Have you ever gotten an advertising email with clean links in it? I mean a link without all the marketing crap appended to the end. The stuff you have to clean off before using it in a post or sending it to a friend? Got my first one today. From Skills Matter on the free [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p>Have you ever gotten an advertising email with clean links in it? I mean a link without all the marketing crap appended to the end. The stuff you have to clean off before using it in a post or sending it to a friend?</p>
<p>Got my first one today. From <a href="http://skillsmatter.com">Skills Matter</a> on the free videos for their <a href="http://skillsmatter.com/event/nosql/progressive-nosql-tutorials/js-2090">Progressive NoSQL Tutorials</a> that just concluded.</p>
<p>High quality presentations, videos freely available after presentation, friendly links in email, just a few of the reasons to support <a href="http://skillsmatter.com">Skills Matter</a>. </p>
<p>The tutorials:</p>
<ul>
<li>Cassandra</li>
<ul>
<li><a href="http://skillsmatter.com/podcast/nosql/cassandra-x-factor/js-2090">Malcolm Box on Putting the X Factor into Cassandra </a></li>
<li><a href="http://skillsmatter.com/podcast/nosql/talk-3-3086/js-2090">Tom Wilkie on Apache Cassandra: a tunably consistent, highly-availble, distributed database </a></li>
<li><a href="http://skillsmatter.com/podcast/nosql/tutorial-2-cassandra/js-2090">Tom Wilkie on Real-time analytics on the Twitter Firehose with Apache Cassandra </a></li>
</ul>
<li>Consistency</li>
<ul>
<li><a href="http://skillsmatter.com/podcast/nosql/russell-brown-eventual-consistency/js-2090">Matt Heitzenroder on Eventual Consistency </a></li>
</ul>
<li>Couchbase</li>
<ul>
<li><a href="http://skillsmatter.com/podcast/nosql/tutorial-1-couchbase/js-2090">John Zablocki’s Couchbase Server Tutorial</a></li>
<li><a href="http://skillsmatter.com/podcast/nosql/zablocki-couchbase/js-2090">John Zablocki on Developing with Couchbase</a></li>
</ul>
<li>CouchDB</li>
<ul>
<li><a href="http://skillsmatter.com/podcast/nosql/couchdb-hutgroup/js-2090">Tom McMillen on CouchDB at the Hut Group </a></li>
</ul>
<li>MongoDB</li>
<ul>
<li><a href="http://skillsmatter.com/podcast/nosql/chris-harris/js-2090">Chris Harris on MongoDB and Document Databases</a></li>
<li><a href="http://skillsmatter.com/podcast/nosql/handson-mongo/js-2090">Chris Harris’ Hands-on MongoDB Tutorial</a></li>
<li><a href="http://skillsmatter.com/podcast/nosql/mongodb-performance/js-2090">David Mytton on MongoDB performance at scale</a></li>
</ul>
<li>Neo4j</li>
<ul>
<li><a href="http://skillsmatter.com/podcast/nosql/neo4j-tales-trenches-prognosql/js-2090">Nicki Watt &amp; Michal Bachman on Neo4j Tales from the Trenches: A recommendation Engine Case Study</a></li>
<li><a href="http://skillsmatter.com/podcast/nosql/neo4j-connected-data/js-2090">Jim Webber on Managing Highly Connected Data in Neo4j</a></li>
</ul>
<li>RavenDB</li>
<ul>
<li><a href="http://skillsmatter.com/podcast/nosql/ayende-rahien-ravdendb/js-2090=">Oren Eini aka Ayende Rahien on RavenDB: A 2nd generation document database</a></li>
<li><a href="http://tm.durusau.net/href">Oren Eini’s RavenDB Crash Course </a></li>
<li><a href="http://skillsmatter.com/podcast/nosql/phil-jones/js-2090">Phil Jones on The challenges and rewards of using RavenDB</a></li>
</ul>
<li>Riak</li>
<ul>
<li><a href="http://skillsmatter.com/podcast/nosql/riak-on-drugs-3084/js-2090">Rune Skou Larsen on Riak on Drugs (and the other way round) </a></li>
<li><a href="http://skillsmatter.com/podcast/nosql/tutorial-1-riak/js-2090">Ian Plosker’s Progresseive Riak Tutorial</a></li>
</ul>
</ul></div>
    </content>
    <updated>2012-05-16T15:20:56Z</updated>
    <published>2012-05-16T15:20:56Z</published>
    <category scheme="http://tm.durusau.net" term="Cassandra"/>
    <category scheme="http://tm.durusau.net" term="CouchDB"/>
    <category scheme="http://tm.durusau.net" term="Couchbase"/>
    <category scheme="http://tm.durusau.net" term="MongoDB"/>
    <category scheme="http://tm.durusau.net" term="Neo4j"/>
    <category scheme="http://tm.durusau.net" term="NoSQL"/>
    <category scheme="http://tm.durusau.net" term="RavenDB"/>
    <category scheme="http://tm.durusau.net" term="Riak"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-16T20:50:27Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25249</id>
    <link href="http://tm.durusau.net/?p=25249" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25249#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;amp;p=25249" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Lucene-1622</title>
    <summary xml:lang="en">Multi-word synonym filter (synonym expansion at indexing time) Lucene-1622 From the description: It would be useful to have a filter that provides support for indexing-time synonym expansion, especially for multi-word synonyms (with multi-word matching for original tokens). The problem is not trivial, as observed on the mailing list. The problems I was able to identify [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="https://issues.apache.org/jira/browse/LUCENE-1622">Multi-word synonym filter (synonym expansion at indexing time) Lucene-1622</a></p>
<p>From the description:</p>
<blockquote><p>It would be useful to have a filter that provides support for indexing-time synonym expansion, especially for multi-word synonyms (with multi-word matching for original tokens).</p>
<p>The problem is not trivial, as observed on the mailing list. The problems I was able to identify (mentioned in the unit tests as well):</p>
<ul>
<li>if multi-word synonyms are indexed together with the original token stream (at overlapping positions), then a query for a partial synonym sequence (e.g., “big” in the synonym “big apple” for “new york city”) causes the document to match;</li>
<li>there are problems with highlighting the original document when synonym is matched (see unit tests for an example),
</li>
<li> if the synonym is of different length than the original sequence of tokens to be matched, then phrase queries spanning the synonym and the original sequence boundary won’t be found. Example “big apple” synonym for “new york city”. A phrase query “big apple restaurants” won’t match “new york city restaurants”.</li>
</ul>
<p>I am posting the patch that implements phrase synonyms as a token filter. This is not necessarily intended for immediate inclusion, but may provide a basis for many people to experiment and adjust to their own scenarios.</p></blockquote>
<p>This remains an open issue as of 16 May 2012.</p>
<p>It is also an <strong>important</strong> open issue.</p>
<p>Think about it.</p>
<p>As “big data” gets larger and larger, at some point traditional ETL isn’t going to be practical. Due to storage, performance, selective granularity or other issues, ETL is going to fade into the sunset.</p>
<p>Indexing, on the other hand, which treats data “<em>in situ</em>” (“in position” for you non-archaeologists in the audience), avoids many of the issues with ETL.</p>
<p>The treatment of synonyms, that is synonyms across data sets, multi-word synonyms, specifying the ranges of synonyms (both for indexing and search), synonym expansion, a whole range of synonyms features and capabilities, needs to “man up” to take on “big data.”  </p></div>
    </content>
    <updated>2012-05-16T14:32:44Z</updated>
    <published>2012-05-16T14:32:44Z</published>
    <category scheme="http://tm.durusau.net" term="Indexing"/>
    <category scheme="http://tm.durusau.net" term="Lucene"/>
    <category scheme="http://tm.durusau.net" term="Synonymy"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-16T20:50:27Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25246</id>
    <link href="http://tm.durusau.net/?p=25246" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25246#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;amp;p=25246" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Managing context data for diverse operating spaces</title>
    <summary xml:lang="en">Managing context data for diverse operating spaces by Wenwei Xuea, Hung Keng Pungb, and Shubhabrata Senb. Abstract: Context-aware computing is an exciting paradigm in which applications perceive and react to changing environments in an unattended manner. To enable behavioral adaptation, a context-aware application must dynamically acquire context data from different operating spaces in the real [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://dx.doi.org/10.1016/j.pmcj.2011.11.001">Managing context data for diverse operating spaces</a> by Wenwei Xuea, Hung Keng Pungb, and Shubhabrata Senb.</p>
<p>Abstract:</p>
<blockquote><p>Context-aware computing is an exciting paradigm in which applications perceive and react to changing environments in an unattended manner. To enable behavioral adaptation, a context-aware application must dynamically acquire context data from different operating spaces in the real world, such as homes, shops and persons. Motivated by the sheer number and diversity of operating spaces, we propose a scalable context data management system in this paper to facilitate data acquisition from these spaces. In our system, we design a gateway framework for all operating spaces and develop matching algorithms to integrate the local context schemas of operating spaces into a global set of domain schemas upon which SQL-based context queries can be issued from applications. The system organizes the operating space gateways as peers in semantic overlay networks and employs distributed query processing techniques over these overlays. Evaluation results on a prototype implementation demonstrate the effectiveness of our system design.</p></blockquote>
<p>This article came up in a sweep for “semantic overlay networks.” </p>
<p>Encouraging recognition that results may need to vary based on physical context. Who knows? Perhaps recognition that the terminology for one domain and its journals/authors/monographs has different semantics than other domains.</p>
<p>Imagine that, a system that manages queries across semantic domains for users, as opposed to users having to understand all the possible semantic domains in advance to have useful query results (or better query results). </p>
<p>Perhaps the “context” metaphor may be a useful one in marketing topic maps. Less aggressive than “silo.” Let the client come up with that to characterize competing agencies or information sources. </p>
<p>“Context” in the sense of physical space is popular among the smart phone crowd so don’t neglect that as an avenue for topic maps as well. (Looking at your surroundings would mean breaking eye contact with your phone. Might miss an ad or something.)</p></div>
    </content>
    <updated>2012-05-16T10:31:10Z</updated>
    <published>2012-05-16T10:31:10Z</published>
    <category scheme="http://tm.durusau.net" term="Context-aware"/>
    <category scheme="http://tm.durusau.net" term="Semantic Overlay Network"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-16T20:50:27Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25243</id>
    <link href="http://tm.durusau.net/?p=25243" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25243#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;amp;p=25243" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">No sorting and lack of structure undermine a chart</title>
    <summary xml:lang="en">No sorting and lack of structure undermine a chart Kaiser Fung takes the Guardian newspaper, yes, that Guardian, to task for poor graphics on gay rights in the United States. When people are critical of your graphics but take heart that even experts fail from time to time.</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://junkcharts.typepad.com/junk_charts/2012/05/no-sorting-and-lack-of-structure-undermine-a-chart.html">No sorting and lack of structure undermine a chart</a></p>
<p>Kaiser Fung takes the Guardian newspaper, yes, that Guardian, to task for poor graphics on gay rights in the United States.</p>
<p>When people are critical of your graphics but take heart that even experts fail from time to time. </p></div>
    </content>
    <updated>2012-05-16T00:27:17Z</updated>
    <published>2012-05-16T00:27:17Z</published>
    <category scheme="http://tm.durusau.net" term="Graphics"/>
    <category scheme="http://tm.durusau.net" term="Visualization"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-16T18:52:18Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25240</id>
    <link href="http://tm.durusau.net/?p=25240" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25240#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;amp;p=25240" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">History matters</title>
    <summary xml:lang="en">History matters by Gene Golovchinsky. Whose history? Your history. Your search history. Visualized. Interested? Read more: Exploratory search is an uncertain endeavor. Quite often, people don’t know exactly how to express their information need, and that need may evolve over time as information is discovered and understood. This is not news. When people search for [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://palblog.fxpal.com/?p=5435">History matters</a> by Gene Golovchinsky.</p>
<p>Whose history? Your history. Your search history. Visualized. </p>
<p>Interested? Read more:</p>
<blockquote><p>Exploratory search is an uncertain endeavor. Quite often, people don’t know exactly how to express their information need, and that need may evolve over time as information is discovered and understood. This is not news.</p>
<p>When people search for information, they often run multiple queries to get at different aspects of the information need, to gain a better understanding of the collection, or to incorporate newly-found information into their searches. This too is not news.</p>
<p>The multiple queries that people run may well retrieve some of the same documents. In some cases, there may be little or no overlap between query results; at other times, the overlap may be considerable. Yet most search engines treat each query as an independent event, and leave it to the searcher to make sense of the results. This, to me, is an opportunity.</p>
<blockquote><p>Design goal: Help people plan future actions by understanding the present in the context of the past.
</p></blockquote>
<p>While web search engines such as Bing make it easy for people to re-visit some recent queries, and early systems such as Dialog allowed Boolean queries to be constructed by combining results of previously-executed queries, these approaches do not help people make sense of the retrieval histories of specific documents with respect to a particular information need. There is nothing new under the sun, however: Mark Sanderson’s <a href="http://www.seg.rmit.edu.au/mark/publications/my_papers/EP-odd.pdf">NRT system</a> flagged documents as having been previously retrieved for a given search task, <a href="http://fxpal.com/?p=abstract&amp;abstractID=84">VOIR</a> used retrieval histograms for each document, and of course a browser maintains a limited history of activity to indicate which links were followed.</p>
<p>Our recent work in Querium (see <a href="http://fxpal.com/?p=abstract&amp;abstractID=667">here</a> and <a href="http://fxpal.com/?p=abstract&amp;abstractID=672">here</a>) seeks to explore this space further by providing searchers with tools that reflect patterns of retrieval of specific documents within a search mission.
</p></blockquote>
<p>Even more interested? Read Greg’s post in full.</p>
<p>If not, check your pulse.</p></div>
    </content>
    <updated>2012-05-16T00:17:25Z</updated>
    <published>2012-05-16T00:17:25Z</published>
    <category scheme="http://tm.durusau.net" term="Search Behavior"/>
    <category scheme="http://tm.durusau.net" term="Search Engines"/>
    <category scheme="http://tm.durusau.net" term="Search History"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-16T18:52:18Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25237</id>
    <link href="http://tm.durusau.net/?p=25237" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25237#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;amp;p=25237" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">SIAM Data Mining 2012 Conference</title>
    <summary xml:lang="en">SIAM Data Mining 2012 Conference Ryan Rosario writes: From April 26-28 I had the pleasure to attend the SIAM Data Mining conference in Anaheim on the Disneyland Resort grounds. Aside from KDD2011, most of my recent conferences had been more “big data” and “data science” oriented, and I wanted to step away from the hype [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www.bytemining.com/2012/05/siam-data-mining-2012-conference/">SIAM Data Mining 2012 Conference</a></p>
<p>Ryan Rosario writes:</p>
<blockquote><p>From April 26-28 I had the pleasure to attend the <a href="http://www.siam.org/meetings/sdm12/">SIAM Data Mining conference in Anaheim on the Disneyland Resort</a> grounds. Aside from <a href="http://www.sigkdd.org/kdd2011/">KDD2011</a>, most of my recent conferences had been more “big data” and “data science” oriented, and I wanted to step away from the hype and just listen to talks that had more substance.</p>
<p>Attending a conference on Disneyland property was quite a bizarre experience. I wanted to get everything I could out of the conference, but the weather was so nice that I also wanted to get everything out of Disneyland as I could. Seeing adults wearing Mickey ears carrying Mickey shaped balloons, and seeing girls dressed up as their favorite Disney princesses screams “fun” rather than “business”, but I managed to make time for both.</p>
<p>The first two days started with a plenary talk from industry or research labs. After a coffee break, there were the usual breakout sessions followed by lunch. During my free 90 minutes, I ran over to Disneyland and California Adventure both days to eat lunch. I managed to run there, wait in line, guide myself through crowds, wait in line, get my food, eat it, and run back to the conference in 90 minutes on a weekend. After lunch on the first two days was another plenary session followed by breakout sessions. The evening of the first two days was reserved for poster sessions. Saturday hosted half-day and full-day workshops.</p>
<p>Below is my summary of the conference. Of course, such a summary is very high level my description may miss things, or may not be entirely correct if I misunderstood the speaker.</p></blockquote>
<p>I doubt Ryan would claim his summary is “as good as being there” but in the absence of attending, you could do far worse.</p>
<p>Suggestions of papers from the conference that I should read first?</p></div>
    </content>
    <updated>2012-05-16T00:04:51Z</updated>
    <published>2012-05-16T00:04:51Z</published>
    <category scheme="http://tm.durusau.net" term="Conferences"/>
    <category scheme="http://tm.durusau.net" term="Data Mining"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-16T18:52:18Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25233</id>
    <link href="http://tm.durusau.net/?p=25233" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25233#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;amp;p=25233" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Using “Punning” to Answer httpRange-14</title>
    <summary xml:lang="en">Using “Punning” to Answer httpRange-14 Jeni Tennison writes in her introduction: As part of the TAG’s work on httpRange-14, Jonathan Rees has assessed how a variety of use cases could be met by various proposals put before the TAG. The results of the assessment are a matrix which shows that “punning” is the most promising [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www.jenitennison.com/blog/node/170">Using “Punning” to Answer httpRange-14</a></p>
<p>Jeni Tennison writes in her introduction:</p>
<blockquote><p>As part of the TAG’s work on httpRange-14, <a href="http://mumble.net/~jar/">Jonathan Rees</a> has assessed how a variety of <a href="http://www.w3.org/wiki/HTTPURIUseCases">use cases</a> could be met by various <a href="http://www.w3.org/wiki/TagIssue57Responses">proposals</a> put before the TAG. The results of the assessment are a <a href="http://www.w3.org/wiki/HTTPURIUseCaseMatrix">matrix</a> which shows that “punning” is the most promising method, unique in not failing on either <a href="http://www.w3.org/wiki/HTTPURIUseCases#J.29_Naive_linked_data_on_hosting_service">ease of use (use case J)</a> or <a href="http://www.w3.org/wiki/HTTPURIUseCases#M.29_HTTP_consistency">HTTP consistency (use case M).</a></p>
<p>In normal use, “punning” is about making jokes based around a word that has two meanings. In this context, “punning” is about using the same URI to mean two (or more) different things. It’s most commonly used as a term of art in OWL but normal people don’t need to worry particularly about that use. Here I’ll explore what that might actually mean as an approach to the httpRange-14 issue.</p></blockquote>
<p>Jeni writes quite well and if you are really interested in the details of this self-inflicted wound, read her post in its entirety.</p>
<p>The post is summarized when she says:</p>
<blockquote><p>Thus an implication of this approach is that the people who define languages and vocabularies must specify what aspect of a resource a URI used in a particular way identifies.</p></blockquote>
<p>Her proposal makes disambiguation explicit. A strategy that is more likely to be successful than others. </p>
<p>Following that statement she treats how to usefully proceed from that position. (No guarantee her position will carry the day but it would be a good thing if it does.) </p></div>
    </content>
    <updated>2012-05-15T23:50:43Z</updated>
    <published>2012-05-15T23:50:43Z</published>
    <category scheme="http://tm.durusau.net" term="Linked Data"/>
    <category scheme="http://tm.durusau.net" term="RDF"/>
    <category scheme="http://tm.durusau.net" term="Semantic Web"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-16T17:58:27Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25230</id>
    <link href="http://tm.durusau.net/?p=25230" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25230#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;amp;p=25230" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Open Data Visualization: Keeping Traces of the Exploration Process</title>
    <summary xml:lang="en">Open Data Visualization: Keeping Traces of the Exploration Process by Benoît Otjacques, Mickaël Stefas, Maël Cornil, and Fernand Feltz. Abstract: This paper describes a system to support the visual exploration of Open Data. During his/her interactive experience with the graphics, the user can easily store the current complete state of the visualization application (called a [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://arxiv.org/abs/1205.2476">Open Data Visualization: Keeping Traces of the Exploration Process</a> by Benoît Otjacques, Mickaël Stefas, Maël Cornil, and Fernand Feltz.</p>
<p>Abstract:</p>
<blockquote><p>This paper describes a system to support the visual exploration of Open Data. During his/her interactive experience with the graphics, the user can easily store the current complete state of the visualization application (called a viewpoint). Next, he/she can compose sequences of these viewpoints (called scenarios) that can easily be reloaded. This feature allows to keep traces of a former exploration process, which can be useful in single user (to support investigation carried out in multiple sessions) as well as in collaborative setting (to share points of interest identified in the data set).</p></blockquote>
<p>I was unaware of this paper when I wrote my “knowledge toilet” post earlier today. This looks like an interesting starting point for discussion.</p>
<p>Just speculating but I think there will be a “sweet spot” for how much effort users will devote to recording their input. For some purposes it will need to be almost automatic. Like the relationship between search terms and links users choose. Crude but somewhat effective.</p>
<p>On the other hand, there will be professional researchers/authors who want to sell their semantic annotations/mappings of resources.</p>
<p>And applications/use cases in between. </p></div>
    </content>
    <updated>2012-05-15T21:49:58Z</updated>
    <published>2012-05-15T21:49:58Z</published>
    <category scheme="http://tm.durusau.net" term="Open Data"/>
    <category scheme="http://tm.durusau.net" term="Visualization"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-16T17:58:27Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25226</id>
    <link href="http://tm.durusau.net/?p=25226" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25226#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25226" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Operations on soft sets revisited</title>
    <summary xml:lang="en">Operations on soft sets revisited by Ping Zhu and Qiaoyan Wen. Abstract: Soft sets, as a mathematical tool for dealing with uncertainty, have recently gained considerable attention, including some successful applications in information processing, decision, demand analysis, and forecasting. To construct new soft sets from given soft sets, some operations on soft sets have been [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://arxiv.org/abs/1205.2857">Operations on soft sets revisited</a> by Ping Zhu and Qiaoyan Wen.</p>
<p>Abstract:</p>
<blockquote><p>Soft sets, as a mathematical tool for dealing with uncertainty, have recently gained considerable attention, including some successful applications in information processing, decision, demand analysis, and forecasting. To construct new soft sets from given soft sets, some operations on soft sets have been proposed. Unfortunately, such operations cannot keep all classical set-theoretic laws true for soft sets. In this paper, we redefine the intersection, complement, and difference of soft sets and investigate the algebraic properties of these operations along with a known union operation. We find that the new operation system on soft sets inherits all basic properties of operations on classical sets, which justifies our definitions. </p></blockquote>
<p>An interesting paper will get you interested in soft sets if you aren’t already. </p>
<p>It isn’t easy going, even with the Alice and Bob examples, which I am sure the authors found immediately intuitive. </p>
<p>If you have data where numeric values cannot be assigned, it will be worth your while to explore this paper and the literature on soft sets.</p></div>
    </content>
    <updated>2012-05-15T20:59:39Z</updated>
    <published>2012-05-15T20:59:39Z</published>
    <category scheme="http://tm.durusau.net" term="Sets"/>
    <category scheme="http://tm.durusau.net" term="Soft Sets"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-16T17:07:59Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25221</id>
    <link href="http://tm.durusau.net/?p=25221" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25221#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25221" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Improving Schema Matching with Linked Data (Flushing the Knowledge Toilet)</title>
    <summary xml:lang="en">Improving Schema Matching with Linked Data by Ahmad Assaf, Eldad Louw, Aline Senart, Corentin Follenfant, Raphaël Troncy, and David Trastour. Abstract: With today’s public data sets containing billions of data items, more and more companies are looking to integrate external data with their traditional enterprise data to improve business intelligence analysis. These distributed data sources [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://arxiv.org/abs/1205.2691">Improving Schema Matching with Linked Data</a> by Ahmad Assaf, Eldad Louw, Aline Senart, Corentin Follenfant, Raphaël Troncy, and David Trastour.</p>
<p>Abstract:</p>
<blockquote><p>With today’s public data sets containing billions of data items, more and more companies are looking to integrate external data with their traditional enterprise data to improve business intelligence analysis. These distributed data sources however exhibit heterogeneous data formats and terminologies and may contain noisy data. In this paper, we present a novel framework that enables business users to semi-automatically perform data integration on potentially noisy tabular data. This framework offers an extension to Google Refine with novel schema matching algorithms leveraging Freebase rich types. First experiments show that using Linked Data to map cell values with instances and column headers with types improves significantly the quality of the matching results and therefore should lead to more informed decisions. </p></blockquote>
<p>Personally I don’t find mapping Airport -&gt; Airport Code all that convincing a demonstration. </p>
<p>The other problem I have is what happens after a user “accepts” a mapping?</p>
<p>Now what?</p>
<p>I can contribute my expertise to mappings between diverse schemas all day, even public ones. </p>
<p>What happens to all that human effort?</p>
<p>It is what I call the “<strong>knowledge toilet</strong>” approach to information retrieval/integration.</p>
<p>Software runs (I can’t count the number of times integration software has been run on Citeseer. Can you?) and a user corrects the results as best they are able. </p>
<p>Now what?</p>
<p>Oh, yeah, the next user or group of users does it all over again.</p>
<p>Why?</p>
<p>Because the user before them flushed the knowledge toilet.</p>
<p>The information had been mapped. Possibly even hand corrected by one or more users. Then it is just tossed away.</p>
<p>That has to seem wrong at some very fundamental level. Whatever semantic technology you choose to use.</p>
<p>I’m open to suggestions.</p>
<p><strong>How do we stop flushing the knowledge toilet?</strong></p></div>
    </content>
    <updated>2012-05-15T20:40:16Z</updated>
    <published>2012-05-15T20:40:16Z</published>
    <category scheme="http://tm.durusau.net" term="Linked Data"/>
    <category scheme="http://tm.durusau.net" term="Schema"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-16T15:20:56Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25217</id>
    <link href="http://tm.durusau.net/?p=25217" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25217#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25217" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Introducing Neo4j into a Relational Database Organisation</title>
    <summary xml:lang="en">Introducing Neo4j into a Relational Database Organisation The details: What: Neo4J User Group:Introducing Neo4j into a Relational Database Organisation Where: The Skills Matter eXchange, London When: 23 May 2012 Starts at 18:30 From the webpage: This month, Toby O’Rourke and Michael McCarthy present their experiences of introducing Neo4j into Gamesys: a Relational Database Organisation. You [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://skillsmatter.com/podcast/nosql/neo4j-at-gamesys">Introducing Neo4j into a Relational Database Organisation</a></p>
<p>The details:</p>
<p><strong>What: Neo4J User Group:Introducing Neo4j into a Relational Database Organisation<br/>
Where: The Skills Matter eXchange, London<br/>
When: 23 May 2012 Starts at  18:30</strong></p>
<p>From the webpage:</p>
<blockquote><p>This month, Toby O’Rourke and Michael McCarthy present their experiences of introducing Neo4j into Gamesys: a Relational Database Organisation.</p>
<p>You will hear about Toby and Michael’s experiences, including</p>
<ul>
<li>the path taken from spring data through tinkerpop, to straight neo then spring data again</li>
<li>Satisfying the reporting requirements of a place built on a data warehouse approach</li>
<li>Modelling our domain</li>
<li>Experience of support contracts and the community as a whole</li>
</ul>
</blockquote>
<p>Just in case you need an additional reason to be in London on 23 May 2012, consult <a href="http://www.londondrum.com/">London Drum City Guide</a>. <img alt=";-)" class="wp-smiley" src="http://tm.durusau.net/wp-includes/images/smilies/icon_wink.gif"/> </p></div>
    </content>
    <updated>2012-05-15T19:20:26Z</updated>
    <published>2012-05-15T19:20:26Z</published>
    <category scheme="http://tm.durusau.net" term="Neo4j"/>
    <category scheme="http://tm.durusau.net" term="RDBMS"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-16T14:32:44Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25214</id>
    <link href="http://tm.durusau.net/?p=25214" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25214#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25214" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Electronic Discovery Institute</title>
    <summary xml:lang="en">Electronic Discovery Institute From the home page: The Electronic Discovery Institute is a non-profit organization dedicated to resolving electronic discovery challenges by conducting studies of litigation processes that incorporate modern technologies. The explosion in volume of electronically stored information and the complexity of its discovery overwhelms the litigation process and the justice system. Technology and [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www.ediscoveryinstitute.org/">Electronic Discovery Institute</a></p>
<p>From the home page:</p>
<blockquote><p>The Electronic Discovery Institute is a non-profit organization dedicated to resolving electronic discovery challenges by conducting studies of litigation processes that incorporate modern technologies. The explosion in volume of electronically stored information and the complexity of its discovery overwhelms the litigation process and the justice system. Technology and efficient processes can ease the impact of electronic discovery.</p>
<p>The Institute operates under the guidance of an independent Board of Diplomats comprised of judges, lawyers and technical experts. The Institute’s studies will measure the relative merits of new discovery technologies and methods. The results of the Institute’s studies will be shared with the public free of charge. In order to obtain our free publications, you must create a free log-in with a legitimate user profile. We do not sell your information. Please visit our sponsors – as they provide altruistic support to our organization.</p></blockquote>
<p>I encountered the Electronic Discovery Institute while researching information on electronic discovery. Since law was and still is an interest of mine, wanted to record it here. </p>
<p>The area of e-discovery is under rapid development, in terms rules that govern it, the technology that it employs and its practice in real world situations with consequences for the players.</p>
<p>Commend this site/organization to anyone interested in e-discovery issues. </p></div>
    </content>
    <updated>2012-05-15T19:03:55Z</updated>
    <published>2012-05-15T19:03:55Z</published>
    <category scheme="http://tm.durusau.net" term="Law"/>
    <category scheme="http://tm.durusau.net" term="Legal Informatics"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-16T10:31:10Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25210</id>
    <link href="http://tm.durusau.net/?p=25210" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25210#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25210" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Natural Language Processing – Nearly Universal Case?</title>
    <summary xml:lang="en">I was reading a paper on natural language processing (NLP) when it occurred to me to ask: When is parsing of any data not natural language processing? I hear the phrase, “natural language processing,” applied to a corpus of emails, blog posts, web pages, electronic texts, transcripts of international phone calls and the like. Other [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p>I was reading a paper on natural language processing (NLP) when it occurred to me to ask: </p>
<p>When is parsing of any data not natural language processing?</p>
<p>I hear the phrase, “natural language processing,” applied to a corpus of emails, blog posts, web pages, electronic texts, transcripts of international phone calls and the like. </p>
<p>Other than following others out of habit, why do we say those are subject to “natural language processing?”</p>
<p>As opposed to say a database schema? </p>
<p>When we “process” the column headers in a database schema, aren’t we engaged in “natural language processing?” What about SGML/XML schemas or instances they govern?</p>
<p>Being mindful of semantics, synonymy and polysemy, it’s hard think of examples that are not “natural language processing.” </p>
<p>At least for data that would be meaningful if read by a person. Streams of numbers perhaps not, but the symbolism that defines their processing I would argue falls under natural language processing. </p>
<p>Thoughts?</p></div>
    </content>
    <updated>2012-05-15T18:54:02Z</updated>
    <published>2012-05-15T18:54:02Z</published>
    <category scheme="http://tm.durusau.net" term="Natural Language Processing"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-16T00:27:17Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25191</id>
    <link href="http://tm.durusau.net/?p=25191" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25191#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25191" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Mining GitHub – Followers in Tinkerpop</title>
    <summary xml:lang="en">Mining GitHub – Followers in Tinkerpop Patrick Wagstrom writes: Development of any moderately complex software package is a social process. Even if a project is developed entirely by a single person, there is still a social component that consists of all of the people who use the software, file bugs, and provide recommendations for enhancements. [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://patrick.wagstrom.net/weblog/2012/05/13/mining-github-followers-in-tinkerpop/">Mining GitHub – Followers in Tinkerpop</a></p>
<p>Patrick Wagstrom writes:</p>
<blockquote><p>Development of any moderately complex software package is a social process. Even if a project is developed entirely by a single person, there is still a social component that consists of all of the people who use the software, file bugs, and provide recommendations for enhancements. This social aspect is one of the driving forces behind the proliferation of social software development sites such as <a href="http://www.github.com/">GitHub</a>, <a href="http://www.sourceforge.net/">SourceForge</a>, <a href="http://code.google.com/">Google Code</a>, and <a href="http://www.bitbucket.org/">BitBucket</a>.</p>
<p>These sites combine together a variety of tools that are common for software development such as version control, bug trackers, mailing lists, release management, project planning, and wikis. In addition, some of these have more social aspects that allow you find and follow individual developers or watch particular projects. In this post I’m going to show you how we can use some this information to gain insight into a software development community, specifically the community around the <a href="http://www.tinkerpop.com/">Tinkerpop</a> stack of tools for graph databases.</p></blockquote>
<p>GitHub as a social community. Who knew? <img alt=";-)" class="wp-smiley" src="http://tm.durusau.net/wp-includes/images/smilies/icon_wink.gif"/> </p>
<p>Very instructive walk through Gremlin, GraphML, and R with a prepared data set. It doesn’t get much better than this!</p></div>
    </content>
    <updated>2012-05-14T23:13:44Z</updated>
    <published>2012-05-14T23:13:44Z</published>
    <category scheme="http://tm.durusau.net" term="Github"/>
    <category scheme="http://tm.durusau.net" term="GraphML"/>
    <category scheme="http://tm.durusau.net" term="Neo4j"/>
    <category scheme="http://tm.durusau.net" term="R"/>
    <category scheme="http://tm.durusau.net" term="TinkerPop"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-16T00:17:25Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25204</id>
    <link href="http://tm.durusau.net/?p=25204" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25204#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25204" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Finite State Automata in Lucene</title>
    <summary xml:lang="en">Finite State Automata in Luceneby Mike McCandless From the post: Lucene Revolution 2012 is now done, and the talk Robert and I gave went well! We showed how we are using automata (FSAs and FSTs) to make great improvements throughout Lucene. You can view the slides here. This was the first time I used Google [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://blog.mikemccandless.com/2012/05/finite-state-automata-in-lucene.html">Finite State Automata in Lucene</a>by Mike McCandless</p>
<p>From the post:</p>
<blockquote><p><a href="http://www.lucenerevolution.com/">Lucene Revolution 2012</a> is now done, and the talk Robert and I gave went well! We showed how we are using automata (<a href="http://en.wikipedia.org/wiki/Finite-state_machine">FSA</a>s and <a href="http://en.wikipedia.org/wiki/Finite_state_transducer">FST</a>s) to make great improvements throughout Lucene.</p>
<p>You can view the slides <a href="https://docs.google.com/presentation/d/1Z7OYvKc5dHAXiVdMpk69uulpIT6A7FGfohjHx8fmHBU/edit">here</a>.</p>
<p>This was the first time I used <a href="http://docs.google.com/">Google Docs</a> exclusively for a talk, and I was impressed! The real-time collaboration was awesome: we each could see the edits the other was doing, live. You never have to “save” your document: instead, every time you make a change, the document is saved to a new revision and you can then use infinite undo, or step back through all revisions, to go back.</p>
<p>Finally, Google Docs covers the whole life-cycle of your talk: editing/iterating, presenting (it presents in full-screen just fine, but does require an internet connection; I exported to PDF ahead of time as a backup) and, finally, sharing with the rest of the world!</p></blockquote>
<p>I must confess to disappointment when I read at slide 23 that “multi-token synonyms mess up graph.”</p>
<p>Particularly since I suspect that not only do synonyms need to be “multi-token” but “multi-dimensional” as well.</p></div>
    </content>
    <updated>2012-05-14T23:12:23Z</updated>
    <published>2012-05-14T23:12:23Z</published>
    <category scheme="http://tm.durusau.net" term="Finite State Automata"/>
    <category scheme="http://tm.durusau.net" term="Lucene"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-15T21:49:58Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25160</id>
    <link href="http://tm.durusau.net/?p=25160" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25160#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25160" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Sorting and Filtering Results in Custom Search</title>
    <summary xml:lang="en">Sorting and Filtering Results in Custom Search From the post: Using Custom Search Engine (CSE), you can create rich search experiences that make it easier for visitors to find the information they’re looking for on your site. Today we’re announcing two improvements to sorting and filtering of search results in CSE. First, CSE now supports [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://googlecustomsearch.blogspot.com/2012/05/sorting-and-filtering-results-in-custom.html">Sorting and Filtering Results in Custom Search</a></p>
<p>From the post:</p>
<blockquote><p>Using <a href="http://www.google.com/cse/">Custom Search Engine </a>(CSE), you can create rich search experiences that make it easier for visitors to find the information they’re looking for on your site. Today we’re announcing two improvements to sorting and filtering of search results in CSE.</p>
<p>First, CSE now supports UI-based <a href="http://support.google.com/customsearch/bin/answer.py?hl=en&amp;answer=2549537">results sorting</a>, which you can enable in the Basics tab of the CSE control panel. Once you’ve updated the CSE element code on your site, a “sort by” picker will become visible at the top of the results section. </p></blockquote>
<p>I am not sure I would call this a “rich search experience” but I suppose any improvement is better than none at all.</p>
<p>Curious how you evaluate the use of “product rich snippets” as being similar to Newcomb’s conferral of properties? (see the post for “product rich snippets”). </p>
<p>Or for that matter, how you would in an indexing context, “confer” additional information on an index entry that does not appear in the document?</p>
<p>To be used when the index is searched. </p>
<p>Comments? </p></div>
    </content>
    <updated>2012-05-14T22:51:55Z</updated>
    <published>2012-05-14T22:51:55Z</published>
    <category scheme="http://tm.durusau.net" term="Google CSE"/>
    <category scheme="http://tm.durusau.net" term="Searching"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-15T21:49:58Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25199</id>
    <link href="http://tm.durusau.net/?p=25199" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25199#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25199" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">CDG – Community Data Generator</title>
    <summary xml:lang="en">CDG – Community Data Generator From the post: CDG is a datawarehouse generator and the newest member of the Ctools family. Given the definition of dimensions that we want, CDG will randomize data within certain parameters and output 3 different things: Database and table ddl for the fact table A file with inserts for the [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://pedroalves-bi.blogspot.com/2012/05/cdg-community-datawarehouse-generator.html">CDG – Community Data Generator</a></p>
<p>From the post:</p>
<blockquote><p>CDG is a datawarehouse generator and the newest member of the <a href="http://ctools.webdetails.org/">Ctools</a> family. Given the definition of dimensions that we want, CDG will randomize data within certain parameters and output 3 different things:</p>
<ul>
<li>Database and table ddl for the fact table</li>
<li>A file with inserts for the fact table</li>
<li>Mondrian schema file to be used within pentaho</li>
</ul>
<p>While most of the documentation mentions the usage within the scope of <a href="http://www.pentaho.com/">Pentaho</a> there’s absolutely nothing that prevents the resulting database to be used in different contexts.</p></blockquote>
<p>I had mentioned ctools before but not in any detail. This was the additional resource that made me pick them back up.</p>
<p>It isn’t hard to see how this data generator will be useful.</p>
<p>For subject-centric software, generating files with known “same subject” characteristics would be more useful. </p>
<p>Thoughts, suggestions or pointers to work on generation of such files? </p></div>
    </content>
    <updated>2012-05-14T22:50:53Z</updated>
    <published>2012-05-14T22:50:53Z</published>
    <category scheme="http://tm.durusau.net" term="Ctools"/>
    <category scheme="http://tm.durusau.net" term="Data"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-15T21:49:58Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25194</id>
    <link href="http://tm.durusau.net/?p=25194" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25194#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25194" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">C*Tools</title>
    <summary xml:lang="en">C*Tools From the webpage: The CTools are a Webdetails Open Source project composed by a collection of Pentaho plugins. Its purpose is to streamline the implementation and design process, expanding even further the range of possibilities of Pentaho Dashboards. This page represents our effort to keep you up to date with the our latest developments. [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://ctools.webdetails.org/">C*Tools</a></p>
<p>From the webpage:</p>
<blockquote><p>The CTools are a Webdetails Open Source project composed by a collection of Pentaho plugins. Its purpose is to streamline the implementation and design process, expanding even further the range of possibilities of Pentaho Dashboards. This page represents our effort to keep you up to date with the our latest developments. Have fun, dazzle your clients and build a “masterpiece of a Dashboard”. </p></blockquote>
<p>Tools include:</p>
<blockquote><p><strong><a href="http://ccc.webdetails.org/">CCC</a></strong>: Community Charting Components (CCC) is a charting library on top of Protovis, a very powerful free and open-source visualization toolkit.</p>
<p><strong><a href="http://cbf.webdetails.org/">CBF</a></strong>: Focused on a multi-project/ multi-environment scenario, the Community Build Framework (CBF) is the way to setup and deploy Pentaho based applications.</p>
<p><strong><a href="http://cda.webdetails.org/">CDA</a></strong>: Community Data Access (CDA) is a Pentaho plugin designed for accessing data with great flexibility. Born for overcoming some cons of the older implementation, CDA allows you to access any of the various Pentaho data sources and:</p>
<ul>
<li>join different datasources just by editing an XML file</li>
<li>cache queries providing a great boost in performance.</li>
<li>deliver data in different formats (csv, xls, etc.) through the Pentaho User </li>
</ul>
<p><strong><a href="http://cde.webdetails.org/">CDE</a></strong>: The Community Dashboard Editor (CDE) is the outcome of real-world needs: It was born to greatly simplify the creation, edition and rendering of dashboards.</p>
<p><strong><a href="http://cdf.webdetails.org/">CDF</a></strong>: Community Dashboard Framework (CDF) is a project that allows you to create friendly, powerful, fully featured dashboards on top of the Pentaho BI server. Former Pentaho dashboards had several drawbacks from a developer’s point of view. The developing process was awkward, it required know-how of web technologies and programming languages, and basically it was time-consuming. CDF emerged as a need for a framework that overcame all those difficulties. The final result is a powerful framework featuring the following:</p>
<ul>
<li>It is based on Open Source technologies.</li>
<li>It separates logic (JavaScript) of the presentation (HTML, CSS)</li>
<li>It features a life cycle with components interacting with each other</li>
<li>It uses AJAX</li>
<li>It is extensible, which gives the users a high level of customization: . Advanced users can extend the library of components.</li>
<li>They also can insert their own snippets of JavaScript and jQuery code.</li>
</ul>
<p><strong><a href="http://cst.webdetails.org/">CST</a></strong>: Community Startup Tabs (CST) represents the easiest way to define and implement the Pentaho startup tabs depending on the user that logs into the PUC. Ranging from a single institutional page to a list of dashboards or reports among other contents, the tabs that each Pentaho user uses to open after loging into the PUC vary depending on the user preferences, or his/her role in the company. Then, why let Pentaho open always the same home page for everyone? The list of tabs to be opened automatically right after the login can be different depending on the user thanks to CST. Community Startup Tabs (CST) is a plugin with the following features:</p>
<ul>
<li>it allows you to define diferent startup tabs for each user that logs into the PUC. .it is easy to configure.</li>
<li>it allows to define startup tabs based on user names or user roles.</li>
<li>for the definition of the startup tabs it allows you to specify user names or roles using regular expressions.</li>
</ul>
</blockquote>
<p>The trick to dashboards (as opposed to some, nameless, applications) is to deliver obviously useful options and information to users. </p></div>
    </content>
    <updated>2012-05-14T22:42:02Z</updated>
    <published>2012-05-14T22:42:02Z</published>
    <category scheme="http://tm.durusau.net" term="Ctools"/>
    <category scheme="http://tm.durusau.net" term="Dashboard"/>
    <category scheme="http://tm.durusau.net" term="Pentaho"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-15T20:59:39Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25188</id>
    <link href="http://tm.durusau.net/?p=25188" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25188#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25188" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">TREC Document Review Project on Hiatus, Recommind Asked to Withdraw</title>
    <summary xml:lang="en">TREC Document Review Project on Hiatus, Recommind Asked to Withdraw From the post: TREC Legal Track — part of the U.S. government’s Text Retrieval Conference — announced last week that the 2012 edition of its annual document review project for testing new systems is canceled, while prominent e-discovery software company Recommind confirmed that it’s been [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www.law.com/jsp/lawtechnologynews/PubArticleLTN.jsp?id=1202553285575&amp;TREC_Document_Review_Project_on_Hiatus_Recommind_Asked_to_Withdraw">TREC Document Review Project on Hiatus, Recommind Asked to Withdraw</a></p>
<p>From the post:</p>
<blockquote><p>TREC Legal Track — part of the U.S. government’s Text Retrieval Conference — announced last week that the 2012 edition of its annual document review project for testing new systems is canceled, while prominent e-discovery software company Recommind confirmed that it’s been asked to leave the project for prematurely sharing results.</p></blockquote>
<p>These difficulties highlight the need for: </p>
<ul>
<li><strong>open data sets</strong> and </li>
<li><strong>protocols for reporting of results as they occur.</strong></li>
</ul>
<p>That requires a data set with relevance judgments and other work.</p>
<p>Have you thought about the: <a href="http://lucene.apache.org/openrelevance/">Open Relevance Project</a> at the Apache Foundation?</p>
<p>Email archives from Apache projects, the backbone of the web as we know it, are ripe for your contributions. </p>
<p>Let me be the first to ask <a href="http://www.recommind.com/">Recommind</a> to join in building a public data set for everyone.</p></div>
    </content>
    <updated>2012-05-14T17:47:46Z</updated>
    <published>2012-05-14T17:47:46Z</published>
    <category scheme="http://tm.durusau.net" term="Data Mining"/>
    <category scheme="http://tm.durusau.net" term="Data Source"/>
    <category scheme="http://tm.durusau.net" term="Open Relevance Project"/>
    <category scheme="http://tm.durusau.net" term="TREC"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-15T19:20:26Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25185</id>
    <link href="http://tm.durusau.net/?p=25185" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25185#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25185" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">ETL 2.0 – Data Integration Comes of Age</title>
    <summary xml:lang="en">ETL 2.0 – Data Integration Comes of Age by Robin Bloor PhD &amp; Rebecca Jozwiak. Well…., sort of. It is a “white paper” and all that implies but when you read: Versatility of Transformations and Scalability All ETL products provide some transformations but few are versatile. Useful transformations may involve translating data formats and coded [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www.enterpriseiq.com.au/whitepapers-aamp-research-mainmenu-66/277-etl-20-data-integration-comes-of-age">ETL 2.0 – Data Integration Comes of Age</a> by Robin Bloor PhD &amp; Rebecca Jozwiak.</p>
<p>Well…., sort of.</p>
<p>It is a “white paper” and all that implies but when you read:</p>
<blockquote><p><strong>Versatility of Transformations and Scalability</strong></p>
<p>All ETL products provide some transformations but few are versatile. Useful transformations may involve translating data formats and coded values between the data sources and the target (if they are, or need to be, different). They may involve deriving calculated values, sorting data, aggregating data, or joining data. They may involve transposing data (from columns to rows) or transposing single columns into multiple columns. They may involve performing look-ups and substituting actual values with looked-up values accordingly, applying validations (and rejecting records that fail) and more. If the ETL tool cannot perform such transformations, they will have to be hand coded elsewhere – in the database or in an application.</p>
<p>It is extremely useful if transformations can draw data from multiple sources and data joins can be performed between such sources “in flight,” eliminating the need for costly and complex staging. Ideally, an ETL 2.0 product will be rich in transformation options since its role is to eliminate the need for direct coding all such data transformations.</p></blockquote>
<p>you start to lose what little respect you had for industry “white papers.”</p>
<p>Not once in this white paper is the term “semantics” used. It is also innocent of using the term “documentation.” </p>
<p>Don’t you think an ETL 2.0 application should enable re-use of “useful transformations?” </p>
<p>Wouldn’t that be a good thing? </p>
<p>Instead of IT staff starting from zero with every transformation request?</p>
<p>Failure to capture the semantics of data leaves you at ETL 2.0, while everyone else is at ETL 3.0. </p>
<p>Where does your business sense tell you about that choice?</p>
<p>(ETL 3.0 – Documented, re-usable, semantics for data and data structures. Enables development of transformation modules for particular data sources.)</p></div>
    </content>
    <updated>2012-05-14T17:18:38Z</updated>
    <published>2012-05-14T17:18:38Z</published>
    <category scheme="http://tm.durusau.net" term="Data Integration"/>
    <category scheme="http://tm.durusau.net" term="ETL"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-15T19:20:26Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25180</id>
    <link href="http://tm.durusau.net/?p=25180" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25180#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25180" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Web Developers Can Now Easily “Play” with RDFa</title>
    <summary xml:lang="en">Web Developers Can Now Easily “Play” with RDFa by Eric Franzon. From the post: Yesterday, we announced RDFa.info, a new site devoted to helping developers add RDFa (Resource Description Framework-in-attributes) to HTML. Building on that work, the team behind RDFa.info is announcing today the release of “PLAY,” a live RDFa editor and visualization tool. This [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://semanticweb.com/web-developers-can-now-easily-play-with-rdfa_b28890">Web Developers Can Now Easily “Play” with RDFa</a> by Eric Franzon.</p>
<p>From the post:</p>
<blockquote><p>Yesterday, we <a href="http://semanticweb.com/new-resource-for-web-developers-announced-add-linked-data-to-html_b28813">announced</a> RDFa.info, a new site devoted to helping developers add RDFa (Resource Description Framework-in-attributes) to HTML.</p>
<p>Building on that work, the team behind <a href="http://rdfa.info/">RDFa.info</a> is announcing today the release of “<a href="http://rdfa.info/play/">PLAY</a>,” a live RDFa editor and visualization tool. This release marks a significant step in providing tools for web developers that are easy to use, even for those unaccustomed to working with RDFa.</p>
<p>“Play” is an effort that serves several purposes. It is an authoring environment and markup debugger for RDFa that also serves as a teaching and education tool for Web Developers. As Alex Milowski, one of the core RDFa.info team, said, “It can be used for purposes of experimentation, documentation (e.g. crafting an example that produces certain triples), and testing. If you want to know what markup will produce what kind of properties (triples), this tool is going to be great for understanding how you should be structuring your own data.”</p></blockquote>
<p>A useful site for learning RDFa that is open for contributions, such as examples and documentation. </p></div>
    </content>
    <updated>2012-05-14T14:16:54Z</updated>
    <published>2012-05-14T14:16:54Z</published>
    <category scheme="http://tm.durusau.net" term="RDF"/>
    <category scheme="http://tm.durusau.net" term="RDFa"/>
    <category scheme="http://tm.durusau.net" term="Semantic Web"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-15T19:03:55Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25177</id>
    <link href="http://tm.durusau.net/?p=25177" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25177#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25177" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Cloudera Manager 4.0 Beta released</title>
    <summary xml:lang="en">Cloudera Manager 4.0 Beta released by Aparna Ramani From the post: We’re happy to announce the Beta release of Cloudera Manager 4.0. This version of Cloudera Manager includes support for CDH4 Beta2 and several new features for both the Free edition and the Enterprise edition. This is the last beta before the GA release. The [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www.cloudera.com/blog/2012/05/cloudera-manager-4-0-beta-released/">Cloudera Manager 4.0 Beta</a> released by Aparna Ramani</p>
<p>From the post:</p>
<blockquote><p>We’re happy to announce the Beta release of Cloudera Manager 4.0. </p>
<p>This version of Cloudera Manager includes support for <a href="http://www.cloudera.com/blog/2012/04/introducing-cdh4-beta-2/">CDH4 Beta2</a> and several new features for both the <a href="https://ccp.cloudera.com/display/FREE400BETA/New+Features+in+Cloudera+Manager+Free+Edition+4.0">Free edition</a> and the <a href="https://ccp.cloudera.com/display/ENT400BETA/New+Features+in+Cloudera+Manager+4.0">Enterprise edition</a>.</p></blockquote>
<p>This is the last beta before the GA release.</p>
<p>The details are:</p>
<blockquote>
<p>I’m pleased to inform our users and customers that we have released the Cloudera’s Distribution Including Apache Hadoop version 4 (CDH4) 2nd and final beta today. We received great feedback from the community from the first beta and this release incorporates that feedback as well as a number of new enhancements.</p>
<p>CDH4 has a great many enhancements compared to CDH3.</p>
<ul>
<li>Availability – a high availability namenode, better job isolation, improved hard disk failure handling, and multi-version support</li>
<li>Utilization – multiple namespaces and a slot-less resource management model</li>
<li>Performance – improvements in HBase, HDFS, MapReduce, Flume and compression performance</li>
<li>Usability – broader BI support, expanded API options, a more responsive Hue with broader browser support</li>
<li>Extensibility – HBase co-processors enable developers to create new kinds of real-time big data applications, the new MapReduce resource management model enables developers to run new data processing paradigms on the same cluster resources and storage</li>
<li>Security – HBase table &amp; column level security and Zookeeper authentication support</li>
</ul>
<p><strong>Some items of note about this beta:</strong></p>
<p>This is the second (and final) beta for CDH4, and this version has all of the major component changes that we’ve planned to incorporate before the platform goes GA.  The second beta:</p>
<ul>
<li>Incorporates the Apache Flume, Hue, Apache Oozie and Apache Whirr components that did not make the first beta</li>
<li>Broadens the platform support back out to our normal release matrix of Red Hat, CentOS, SUSE, Ubuntu and Debian</li>
<li>Standardizes our release matrix of supported databases to include MySQL, PostgresSQL and Oracle</li>
<li>Includes a number of improvements to existing components like adding auto-failover support to HDFS’s high availability feature and adding multi-homing support to HDFS and MapReduce</li>
<li>Incorporates a number of fixes that were identified during the first beta period like removing a HBase performance regression</li>
</ul>
</blockquote>
<p>Not as romantic as your subject analysis activities but someone has to manage the systems that implement your analysis! </p>
<p>Not to mention skills here making you more attractive in any big data context.</p></div>
    </content>
    <updated>2012-05-14T13:49:21Z</updated>
    <published>2012-05-14T13:49:21Z</published>
    <category scheme="http://tm.durusau.net" term="Cloudera"/>
    <category scheme="http://tm.durusau.net" term="Hadoop"/>
    <category scheme="http://tm.durusau.net" term="MapReduce"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-14T23:50:01Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25174</id>
    <link href="http://tm.durusau.net/?p=25174" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25174#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25174" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Lucene conference touches many areas of growth in search</title>
    <summary xml:lang="en">Lucene conference touches many areas of growth in search by Andy Oram. From the post: With a modern search engine and smart planning, web sites can provide visitors with a better search experience than Google. For instance, Google may well turn up interesting results if you search for a certain kind of shirt, but a [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p>Lucene conference touches many areas of growth in search by Andy Oram.</p>
<p>From the post:</p>
<blockquote><p>With a modern search engine and smart planning, web sites can provide visitors with a better search experience than Google. For instance, Google may well turn up interesting results if you search for a certain kind of shirt, but a well-designed clothing site can also pull up related trousers, skirts, and accessories. It’s not Google’s job to understand the intricate interrelationships of data on a particular web property, but the site’s own team can constantly tune searches to reflect what the site has to offer and what its visitors uniquely need.</p>
<p>Hence the important of search engines like Solr, based on the Lucene library. Both are open source Apache projects, maintained by Lucid Imagination, a company founded to commercialize the underlying technology. I attended parts of Lucid Imagination’s conference this week, <a href="http://www.lucenerevolution.com/">Lucene Revolution</a>, and found Lucene evolving in the ways much of the computer industry is headed.</p></blockquote>
<p>Andy’s summary of the conference will make you wonder two things:</p>
<ol>
<li>Why weren’t you at the Lucene Revolution conference this year?
</li>
<li>Where are the videos from Lucene Revolution 2012?</li>
</ol>
<p>I won’t ever be able to answer #1 but will post an answer to #2 as soon as it is available. </p></div>
    </content>
    <updated>2012-05-14T13:35:38Z</updated>
    <published>2012-05-14T13:35:38Z</published>
    <category scheme="http://tm.durusau.net" term="BigData"/>
    <category scheme="http://tm.durusau.net" term="Lucene"/>
    <category scheme="http://tm.durusau.net" term="LucidWorks"/>
    <category scheme="http://tm.durusau.net" term="Solr"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-14T23:50:01Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25171</id>
    <link href="http://tm.durusau.net/?p=25171" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25171#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25171" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Feynman on Curiousity</title>
    <summary xml:lang="en">Feynman on Curiosity Ethan Fosse embeds a short video of Feynman on curiosity. I created a category of “curiosity” today, belatedly. Curiosity is largely responsible for the variety of materials and resources on this blog. I try to point out what may be helpful in your current or next project. But I am also curious [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://ethanfosse.blogspot.ca/2012/04/feynman-on-curiosity.html">Feynman on Curiosity</a></p>
<p>Ethan Fosse embeds a short video of Feynman on curiosity.</p>
<p>I created a category of “curiosity” today, belatedly. </p>
<p>Curiosity is largely responsible for the variety of materials and resources on this blog.</p>
<p>I try to point out what may be helpful in your current or next project.</p>
<p>But I am also curious about what lies just beyond the technique or data I have just discussed.</p>
<p>Enjoy!</p></div>
    </content>
    <updated>2012-05-14T09:13:44Z</updated>
    <published>2012-05-14T09:13:44Z</published>
    <category scheme="http://tm.durusau.net" term="Curiosity"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-14T17:47:46Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25168</id>
    <link href="http://tm.durusau.net/?p=25168" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25168#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25168" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Synonyms in the TMDM Legend</title>
    <summary xml:lang="en">I was going over some notes on synonyms this weekend when it occurred to me to ask: How many synonyms does a topic item have in the TMDM legend? A synonym being when one term can be freely substituted for another. Not wanting to trust my memory, I quote from the TMDM legend (ISO/IEC 13250-2): [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p>I was going over some notes on synonyms this weekend when it occurred to me to ask:</p>
<p><em>How many synonyms does a topic item have in the TMDM legend?</em></p>
<p>A synonym being when one term can be freely substituted for another.</p>
<p>Not wanting to trust my memory, I quote from the TMDM legend (ISO/IEC 13250-2):</p>
<blockquote><p>Two topic items are equal if they have: </p>
<ul>
<li>at least one equal string in their [subject identifiers] properties,</li>
<li>at least one equal string in their [item identifiers] properties,</li>
<li>at least one equal string in their [subject locators] properties,</li>
<li>an equal string in the [subject identifiers] property of the one topic item and the [item identifiers] property of the other, or</li>
<li>the same information item in their [reified] properties.</li>
</ul>
</blockquote>
<p>The wording is a bit awkward for my point about synonyms but I take it that if two topic had</p>
<blockquote><p>at least one equal string in their [subject identifiers] properties,</p></blockquote>
<p>I could substitute:</p>
<blockquote><p>at least one equal string in their [item identifiers] properties, (in all relevant places)</p></blockquote>
<p>and have the same effect. </p>
<p>I am going to be exploring the use of synonym based processing for TMDM governed topic maps.</p>
<p>Any thoughts or insights would be greatly appreciated. </p></div>
    </content>
    <updated>2012-05-14T03:10:14Z</updated>
    <published>2012-05-14T03:10:14Z</published>
    <category scheme="http://tm.durusau.net" term="Synonymy"/>
    <category scheme="http://tm.durusau.net" term="TMDM"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-14T17:47:46Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25165</id>
    <link href="http://tm.durusau.net/?p=25165" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25165#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25165" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Tika – A content analysis toolkit</title>
    <summary xml:lang="en">Tika – A content analysis toolkit From the webpage: The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries. You can find the latest release on the download page. See the Getting Started guide for instructions on how to start using Tika. From the supported formats [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://tika.apache.org/">Tika – A content analysis toolkit</a></p>
<p>From the webpage:</p>
<blockquote><p>The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries. You can find the latest release on the <a href="http://tika.apache.org/download.html">download page</a>. See the <a href="http://tika.apache.org/1.1/gettingstarted.html">Getting Started</a> guide for instructions on how to start using Tika.
</p></blockquote>
<p>From the supported formats page:</p>
<blockquote>
<ul>
<li>HyperText Markup Language</li>
<li>XML and derived formats</li>
<li>Microsoft Office document formats</li>
<li>OpenDocument Format</li>
<li>Portable Document Format</li>
<li>Electronic Publication Format</li>
<li>Rich Text Format</li>
<li>Compression and packaging formats</li>
<li>Text formats</li>
<li>Audio formats</li>
<li>Image formats</li>
<li>Video formats</li>
<li>Java class files and archives</li>
<li>The mbox format</li>
</ul>
</blockquote>
<p>One suspects that even the vastness of “dark data” has a finite number of formats. </p>
<p>Tika may not cover all of them, but perhaps enough to get you started. </p></div>
    </content>
    <updated>2012-05-14T02:59:29Z</updated>
    <published>2012-05-14T02:59:29Z</published>
    <category scheme="http://tm.durusau.net" term="Content Analysis"/>
    <category scheme="http://tm.durusau.net" term="Tika"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-14T17:47:46Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25162</id>
    <link href="http://tm.durusau.net/?p=25162" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25162#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25162" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Are visual dictionaries generalizable?</title>
    <summary xml:lang="en">Are visual dictionaries generalizable? by Otavio A. B. Penatti, Eduardo Valle, and Ricardo da S. Torres Abstract: Mid-level features based on visual dictionaries are today a cornerstone of systems for classification and retrieval of images. Those state-of-the-art representations depend crucially on the choice of a codebook (visual dictionary), which is usually derived from the dataset. [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p>Are visual dictionaries generalizable? by Otavio A. B. Penatti, Eduardo Valle, and Ricardo da S. Torres</p>
<p>Abstract:</p>
<blockquote><p>Mid-level features based on visual dictionaries are today a cornerstone of systems for classification and retrieval of images. Those state-of-the-art representations depend crucially on the choice of a codebook (visual dictionary), which is usually derived from the dataset. In general-purpose, dynamic image collections (e.g., the Web), one cannot have the entire collection in order to extract a representative dictionary. However, based on the hypothesis that the dictionary reflects only the diversity of low-level appearances and does not capture semantics, we argue that a dictionary based on a small subset of the data, or even on an entirely different dataset, is able to produce a good representation, provided that the chosen images span a diverse enough portion of the low-level feature space. Our experiments confirm that hypothesis, opening the opportunity to greatly alleviate the burden in generating the codebook, and confirming the feasibility of employing visual dictionaries in large-scale dynamic environments. </p></blockquote>
<p>The authors use the <a href="http://en.wikipedia.org/wiki/Caltech_101">Caltech-101</a> image set because of its “diversity.” Odd because they cite the <a href="http://www.vision.caltech.edu/Image_Datasets/Caltech256/">Caltech-256</a> image set, which was created to answer concerns about the <em>lack of diversity</em> in the Caltech-101 image set.</p>
<p>Not sure this paper answers the issues it raises about visual dictionaries. </p>
<p>Wanted to bring it to your attention because representative dictionaries (as opposed to comprehensive ones) may be lurking just beyond the semantic horizon. </p></div>
    </content>
    <updated>2012-05-14T00:54:39Z</updated>
    <published>2012-05-14T00:54:39Z</published>
    <category scheme="http://tm.durusau.net" term="Classification"/>
    <category scheme="http://tm.durusau.net" term="Dictionary"/>
    <category scheme="http://tm.durusau.net" term="Image Recognition"/>
    <category scheme="http://tm.durusau.net" term="Information Retrieval"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-14T17:47:46Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25157</id>
    <link href="http://tm.durusau.net/?p=25157" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25157#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25157" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Zero Tolerance Search : 24 year old neuroscientist</title>
    <summary xml:lang="en">Zero Tolerance Search : 24 year old neuroscientist Matthew Hurst writes: [The idea behind 'zero tolerance search' posts is to illustrate real life search interactions that show how far we have to go in leveraging the explicit and implicit data in the web and elsewhere.] Yesterday, I heard part of an interview on NPR. The [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://datamining.typepad.com/data_mining/2012/05/zero-tolerance-search-24-year-old-neuroscientist.html">Zero Tolerance Search : 24 year old neuroscientist</a></p>
<p>Matthew Hurst writes:</p>
<blockquote><p>[The idea behind 'zero tolerance search' posts is to illustrate real life search interactions that show how far we have to go in leveraging the explicit and implicit data in the web and elsewhere.]</p>
<p>Yesterday, I heard part of an interview on NPR. The interview was around a new book on determinism and neuroscience. The only thing I remember about the author was his young age. I wanted to recover the name of the author and the title of his new book so that I could comment on his argument against determinism (which was, essentially, ‘I’m afraid of determinism therefore it can’t be right’).</p></blockquote>
<p>Matthew continues to outline how the text matching of major search engines fail. </p>
<p>How would you improve the results? </p></div>
    </content>
    <updated>2012-05-13T23:48:53Z</updated>
    <published>2012-05-13T23:48:53Z</published>
    <category scheme="http://tm.durusau.net" term="Search Engines"/>
    <category scheme="http://tm.durusau.net" term="Searching"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-14T17:47:46Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25153</id>
    <link href="http://tm.durusau.net/?p=25153" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25153#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25153" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Dark Data</title>
    <summary xml:lang="en">Lucid Imagination Combines Search, Analytics and Big Data to Tackle the Problem of Dark Data This post was too well written to break up as quotes/excerpts. I am re-posting it in full. Organizations today have little to no idea how much lost opportunity is hidden in the vast amounts of data they’ve collected and stored.  [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www.lucidimagination.com/about/news/releases/lucid-imagination-combines-search-analytics-and-big-data-tackle-dark-data">Lucid Imagination Combines Search, Analytics and Big Data to Tackle the Problem of Dark Data</a></p>
<p>This post was too well written to break up as quotes/excerpts. I am re-posting it in full.</p>
<blockquote>
<p>Organizations today have little to no idea how much lost opportunity is hidden in the vast amounts of data they’ve collected and stored.  They have entered the age of total data overload driven by the sheer amount of unstructured information, also called “dark” data, which is contained in their stored audio files, text messages, e-mail repositories, log files, transaction applications, and various other content stores.  And this dark data is continuing to grow, far outpacing the ability of the organization to track, manage and make sense of it.</p>
<p><a href="http://www.lucidimagination.com">Lucid Imagination</a>, a developer of search, discovery and analytics software based on Apache Lucene and Apache Solr technology, today unveiled <a href="http://www.lucidimagination.com/products/lucidworks-search-platform/lucidworks-big-data">LucidWorks Big Data</a><strong>™</strong>. LucidWorks Big Data is the industry’s first fully integrated development stack that combines the power of multiple open source projects including Hadoop, Mahout, R and Lucene/Solr to provide search, machine learning, recommendation engines and analytics for structured and unstructured content in one complete solution available in the cloud.</p>
<p><strong>Tweet This:</strong> Lucid Imagination combines #search, analytics and #BigData in complete stack. Beta now open <a href="http://ow.ly/aMHef">http://ow.ly/aMHef</a></p>
<p>With LucidWorks Big Data, Lucid Imagination equips technologists and business users with the ability to initially pilot Big Data projects utilizing technologies such as Apache Lucene/Solr, Mahout and Hadoop, in a cloud sandbox. Once satisfied, the project can remain in the cloud, be moved on premise or executed within a hybrid configuration.  This means they can avoid the staggering overhead costs and long lead times associated with infrastructure and application development lifecycles prior to placing their Big Data solution into production.</p>
<p>The product is now available in beta. To sign up for inclusion in the beta program, visit http://www.lucidimagination.com/products/lucidworks-search-platform/lucidworks-big-data.</p>
<p><strong>Dark Data Problem Is Real</strong></p>
<p>How big is the problem of dark data? The total amount of digital data in the world will reach 2.7 zettabytes in 2012, a 48 percent increase from 2011.* 90 percent of this data will be unstructured or “dark” data. Worldwide, 7.5 quintillion bytes of data, enough to fill over 100,000 Libraries of Congress get generated every day. Conversely, that deep volume of data can serve to help predict the weather, uncover consumer buying patterns or even ease traffic problems – if discovered and analyzed proactively.</p>
<p>“We see a strong opportunity for search to play a key role in the future of data management and analytics,” said Matthew Aslett, research manager, data management and analytics, 451 Research. “Lucid’s Big Data offering, and its combination of large-scale data storage in Hadoop with Lucene/Solr-based indexing and machine-learning capabilities, provides a platform for developing new applications to tackle emerging data management challenges.”</p>
<p><strong>LucidWorks Big Data</strong></p>
<p>Data analytics has traditionally been the domain of business intelligence technologies. Most of these tools, however, have been designed to handle structured data such as SQL, and cannot easily tap into the broad range of data types that can be used in a Big Data application. With the announcement of LucidWorks Big Data, organizations will be able to utilize a single platform for their Big Data search, discovery and analytics needs. LucidWorks Big Data is the only complete platform that:</p>
<ul>
<li>Combines the real time, ad hoc data accessibility of LucidWorks (Lucene/Solr) with compute and storage capabilities of Hadoop</li>
<li>Delivers commonly used analytic capabilities along with Mahout’s proven, scalable machine learning algorithms for deeper insight into both content and users</li>
<li>Tackles data, both big and small with ease, seamlessly scaling while minimizing the impact of provisioning Hadoop, LucidWorks and other components</li>
<li>Supplies a single, coherent, secure and well documented REST API for both application integration and administration</li>
<li>Offers fault tolerance with data safety baked in</li>
<li>Provides choice and flexibility, via on premise, cloud hosted or hybrid deployment solutions</li>
<li>Is tested, integrated and fully supported by the world’s leading experts in open source search.</li>
<li>Includes powerful tools for configuration, deployment, content acquisition, security, and search experience that is packaged in a convenient, well-organized application</li>
</ul>
<p>Lucid Imagination’s Open Search Platform uncovers real-time insights from any enterprise data, whether structured in databases, unstructured in formats such as emails or social channels, or semi-structured from sources such as websites.  The company’s rich portfolio of enterprise-grade solutions is based on the same proven open source Apache Lucene/Solr technology that powers many of the world’s largest e-commerce sites. Lucid Imagination’s on-premise and cloud platforms are quicker to deploy, cost less than competing products and are more easily tailored to specific needs than business intelligence solutions because they leverage innovation from the open source community.  </p>
<p>“We’re allowing a broad set of enterprises to test and implement data discovery and analysis projects that have historically been the province of large multinationals with large data centers. Cloud computing and LucidWorks Big Data finally level the field,” said Paul Doscher, CEO of Lucid Imagination. “Large companies, meanwhile, can use our Big Data stack to reduce the time and cost associated with evaluating and ultimately implementing big data search, discovery and analysis. It’s their data – now they can actually benefit from it.”</p>
<ul>
<li>Watch the future of open search unfold on the <a href="http://www.lucidimagination.com/blog/">Lucid Imagination blog</a></li>
<li>Follow Lucid Imagination on Twitter and <a href="http://www.facebook.com/pages/Lucid-Imagination/245360168771">Facebook</a>.</li>
</ul>
</blockquote></div>
    </content>
    <updated>2012-05-13T23:39:16Z</updated>
    <published>2012-05-13T23:37:15Z</published>
    <category scheme="http://tm.durusau.net" term="BigData"/>
    <category scheme="http://tm.durusau.net" term="Lucene"/>
    <category scheme="http://tm.durusau.net" term="LucidWorks"/>
    <category scheme="http://tm.durusau.net" term="Solr"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-14T14:21:43Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25150</id>
    <link href="http://tm.durusau.net/?p=25150" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25150#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25150" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Multilingual Natural Language Processing Applications: From Theory to Practice</title>
    <summary xml:lang="en">Multilingual Natural Language Processing Applications: From Theory to Practice by Daniel Bikel and Imed Zitouni. From the description: Multilingual Natural Language Processing Applications is the first comprehensive single-source guide to building robust and accurate multilingual NLP systems. Edited by two leading experts, it integrates cutting-edge advances with practical solutions drawn from extensive field experience. Part [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www.ibmpressbooks.com/bookstore/product.asp?isbn=0137151446">Multilingual Natural Language Processing Applications: From Theory to Practice</a> by Daniel Bikel and Imed Zitouni.</p>
<p>From the description:</p>
<blockquote>
<p><em><strong>Multilingual Natural Language Processing Applications</strong></em> is the first comprehensive single-source guide to building robust and accurate multilingual NLP systems. Edited by two leading experts, it integrates cutting-edge advances with practical solutions drawn from extensive field experience.</p>
<p>Part I introduces the core concepts and theoretical foundations of modern multilingual natural language processing, presenting today’s best practices for understanding word and document structure, analyzing syntax, modeling language, recognizing entailment, and detecting redundancy.</p>
<p>Part II thoroughly addresses the practical considerations associated with building real-world applications, including information extraction, machine translation, information retrieval/search, summarization, question answering, distillation, processing pipelines, and more.</p>
<p>This book contains important new contributions from leading researchers at IBM, Google, Microsoft, Thomson Reuters, BBN, CMU, University of Edinburgh, University of Washington, University of North Texas, and others.</p>
<p>Coverage includes</p>
<p>Core NLP problems, and today’s best algorithms for attacking them</p>
<ul>
<li>Processing the diverse morphologies present in the world’s languages</li>
<li>Uncovering syntactical structure, parsing semantics, using semantic role labeling, and scoring grammaticality</li>
<li>Recognizing inferences, subjectivity, and opinion polarity</li>
<li>Managing key algorithmic and design tradeoffs in real-world applications</li>
<li>Extracting information via mention detection, coreference resolution, and events</li>
<li>Building large-scale systems for machine translation, information retrieval, and summarization</li>
<li>Answering complex questions through distillation and other advanced techniques</li>
<li>Creating dialog systems that leverage advances in speech recognition, synthesis, and dialog management</li>
<li>Constructing common infrastructure for multiple multilingual text processing applications</li>
</ul>
<p>This book will be invaluable for all engineers, software developers, researchers, and graduate students who want to process large quantities of text in multiple languages, in any environment: government, corporate, or academic.</p></blockquote>
<p>I could not bring myself to buy it for Carol (Mother’s Day) so I will have to wait for Father’s Day (June). <img alt=";-)" class="wp-smiley" src="http://tm.durusau.net/wp-includes/images/smilies/icon_wink.gif"/> </p>
<p>If you get it before then, comments welcome!</p></div>
    </content>
    <updated>2012-05-13T17:35:53Z</updated>
    <published>2012-05-13T17:35:53Z</published>
    <category scheme="http://tm.durusau.net" term="Multilingual"/>
    <category scheme="http://tm.durusau.net" term="Natural Language Processing"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-14T14:21:43Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25146</id>
    <link href="http://tm.durusau.net/?p=25146" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25146#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25146" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">TREC 2012 Crowdsourcing Track</title>
    <summary xml:lang="en">TREC 2012 Crowdsourcing Track Panos Ipeirotis writes: TREC 2012 Crowdsourcing Track - Call for Participation  June 2012 – November 2012 https://sites.google.com/site/treccrowd/ Goals As part of the National Institute of Standards and Technology (NIST)‘s annual Text REtrieval Conference (TREC), the Crowdsourcing track investigates emerging crowd-based methods for search evaluation and/or developing hybrid automation and crowd search systems. [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www.behind-the-enemy-lines.com/2012/05/trec-2012-crowdsourcing-track.html">TREC 2012 Crowdsourcing Track</a> </p>
<p>Panos Ipeirotis writes:</p>
<blockquote><p>
<strong>TREC 2012 Crowdsourcing Track - Call for Participation<br/>
 June 2012 – November 2012</strong></p>
<p><a href="http://www.google.com/url?sa=D&amp;q=https://sites.google.com/site/treccrowd/&amp;usg=AFQjCNHjMQsQmCcq9dC6yrGUFE3VEdmoEA">https://sites.google.com/site/treccrowd/</a></p>
<p><strong>Goals</strong></p>
<p>As part of the <a href="http://www.nist.gov/">National Institute of Standards and Technology (NIST)</a>‘s annual <a href="http://trec.nist.gov/">Text REtrieval Conference (TREC)</a>, the Crowdsourcing track investigates emerging crowd-based methods for search evaluation and/or developing hybrid automation and crowd search systems.</p>
<p>This year, our goal is to evaluate approaches to crowdsourcing high quality relevance judgments for two different types of media:</p>
<ol>
<li>textual documents</li>
<li>images</li>
</ol>
<p>For each of the two tasks, participants will be expected to crowdsource relevance labels for approximately 20k topic-document pairs (i.e., 40k labels when taking part in both tasks). In the first task, the documents will be from an English news text corpora, while in the second task the documents will be images from Flickr and from a European news agency.</p>
<p>Participants may use any crowdsourcing methods and platforms, including home-grown systems. Submissions will be evaluated against a gold standard set of labels and against consensus labels over all participating teams.</p>
<p><strong>Tentative Schedule</strong></p>
<ul>
<li>Jun 1: Document corpora, training topics (for image task) and task guidelines available</li>
<li>Jul 1: Training labels for the image task</li>
<li>Aug 1: Test data released</li>
<li>Sep 15: Submissions due</li>
<li>Oct 1: Preliminary results released</li>
<li>Oct 15: Conference notebook papers due</li>
<li>Nov 6-9: TREC 2012 conference at NIST, Gaithersburg, MD, USA</li>
<li>Nov 15: Final results released</li>
<li>Jan 15, 2013: Final papers due</li>
</ul>
</blockquote>
<p>As you know, I am interested in crowd sourcing of paths through data and assignment of semantics.</p>
<p>Although I am puzzled why we continue to put emphasis on post-creation assignment of semantics?</p>
<p>After data is created, we look around surprised the data has no explicit semantics.</p>
<p>Like realizing you are on Main Street without your pants. </p>
<p>Why don’t we look to the data creation process to assign explicit semantics?</p>
<p>Thoughts? </p></div>
    </content>
    <updated>2012-05-12T23:22:50Z</updated>
    <published>2012-05-12T23:22:50Z</published>
    <category scheme="http://tm.durusau.net" term="Crowd Sourcing"/>
    <category scheme="http://tm.durusau.net" term="TREC"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-14T09:13:44Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25143</id>
    <link href="http://tm.durusau.net/?p=25143" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25143#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25143" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Initial HTTP Speed+Mobility Open Source Prototype Now Available for Download</title>
    <summary xml:lang="en">Initial HTTP Speed+Mobility Open Source Prototype Now Available for Download From the post: Microsoft Open Technologies, Inc. has just published an initial open source prototype implementation of HTTP Speed+Mobility. The prototype is available for download on html5labs.com, where you will also find pointers to the source code. The IETF HTTPbis workgroup met in Paris at [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://blogs.msdn.com/b/interoperability/archive/2012/05/11/news-from-ms-open-tech-initial-http-speed-mobility-open-source-prototype-now-available-for-download.aspx">Initial HTTP Speed+Mobility Open Source Prototype Now Available for Download</a></p>
<p>From the post:</p>
<blockquote><p><a href="http://blogs.msdn.com/b/interoperability/archive/2012/04/12/announcing-one-more-way-microsoft-will-engage-with-the-open-source-and-standards-communities.aspx" target="_blank">Microsoft Open Technologies, Inc.</a> has just published an initial open source prototype implementation of <a href="https://datatracker.ietf.org/doc/draft-montenegro-httpbis-speed-mobility/" target="_blank">HTTP Speed+Mobility</a>. The prototype is available for download on <a href="http://html5labs.interoperabilitybridges.com/prototypes/http-speed-plus-mobility/http-speed-plus-mobility/info" target="_blank">html5labs.com</a>, where you will also find pointers to the source code.</p>
<p>The <a href="https://datatracker.ietf.org/wg/httpbis/charter/" target="_blank">IETF HTTPbis workgroup</a> met in Paris at the <a href="http://www.ietf.org/meeting/83/index.html" target="_blank">end of March</a> to discuss how to approach HTTP 2.0 in order to meet the needs of an ever larger and more diverse web. It would be hard to downplay the importance of this work: it will impact how billions of devices communicate over the internet for years to come, from low-powered sensors, to mobile phones, to tablets, to PCs, to network switches, to the largest datacenters on the planet.</p>
<p>Prior to that IETF meeting, Jean Paoli and Sandeep Singhal announced in their post to the <a href="http://blogs.msdn.com/b/interoperability/archive/2012/03/25/speed-and-mobility-an-approach-for-http-2-0-to-make-mobile-apps-and-the-web-faster.aspx" target="_blank">Microsoft Interoperability blog</a> that Microsoft has contributed the <a href="https://datatracker.ietf.org/doc/draft-montenegro-httpbis-speed-mobility/" target="_blank">HTTP Speed+Mobility proposal</a> as input to that conversation.</p>
<p>The prototype implements the websocket-based session layer described in the proposal, as well as parts of the multiplexing logic incorporated from Google’s SPDY proposal. The code does not support header compression yet, but it will in upcoming refreshes.</p>
<p>The open source software comprises a client implemented in C# and a server implemented in Node.js running on Windows Azure. The client is a command line tool that establishes a connection to the server and can download a set of web pages that include html files, scripts, and images. We have made available on the server some static versions of popular web pages like <a href="http://www.microsoft.com" target="_blank">http://www.microsoft.com</a> and <a href="http://www.ietf.org" target="_blank">http://www.ietf.org</a>, as well as a handful of simpler test pages.</p>
</blockquote>
<p>I have avoided having a cell phone much less a smart phone all these years. </p>
<p>Now it looks like to evaluate/test semantic applications, including topic maps, I am going to have to get one. </p>
<p>Thanks Jean and Sandeep! <img alt=";-)" class="wp-smiley" src="http://tm.durusau.net/wp-includes/images/smilies/icon_wink.gif"/>  </p></div>
    </content>
    <updated>2012-05-12T21:35:38Z</updated>
    <published>2012-05-12T21:35:38Z</published>
    <category scheme="http://tm.durusau.net" term="HTTP Speed+Mobility"/>
    <category scheme="http://tm.durusau.net" term="Interface Research/Design"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-14T09:13:44Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25140</id>
    <link href="http://tm.durusau.net/?p=25140" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25140#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25140" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Paxos Made Moderately Complex</title>
    <summary xml:lang="en">Paxos Made Moderately Complex From the post: If you are a normal human being and find the Paxos protocol confusing, then this paper, Paxos Made Moderately Complex, is a great find. Robbert van Renesse from Cornell University has written a clear and well written paper with excellent explanations. The Abstract: For anybody who has ever [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://highscalability.com/blog/2012/5/10/paper-paxos-made-moderately-complex.html">Paxos Made Moderately Complex</a></p>
<p>From the post:</p>
<blockquote><p>If you are a normal human being and find the <a href="http://en.wikipedia.org/wiki/Paxos_(computer_science)">Paxos protocol</a> confusing, then this paper, <a href="http://www.cs.cornell.edu/courses/cs7412/2011sp/paxos.pdf">Paxos Made Moderately Complex</a>, is a great find. Robbert van Renesse from Cornell University has written a clear and well written paper with excellent explanations.</p>
<p>The Abstract:</p>
<blockquote><p>For anybody who has ever tried to implement it, Paxos is by no means a simple protocol, even though it is based on relatively simple invariants. This paper provides imperative pseudo-code for the full Paxos (or Multi-Paxos) protocol without shying away from discussing various implementation details. The initial description avoids optimizations that complicate comprehension. Next we discuss liveness, and list various optimizations that make the protocol practical.</p></blockquote>
</blockquote>
<p>If you need safety (“freedom from inconsistency”) and fault-tolerant topic map results, you may want to spend some quality time with this paper. </p>
<p>As with most things, user requirements are going to drive the choices you have to make. </p>
<p>Hard for me to think a “loosely consistent” merging system is useful, but for TV entertainment data that may be enough. Who is sleeping with who probably has lag time in reporting anyway.</p>
<p>For more serious data, Paxos may be your protocol of choice. </p></div>
    </content>
    <updated>2012-05-12T21:16:01Z</updated>
    <published>2012-05-12T21:16:01Z</published>
    <category scheme="http://tm.durusau.net" term="Paxos"/>
    <category scheme="http://tm.durusau.net" term="Scalability"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-14T09:13:44Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25137</id>
    <link href="http://tm.durusau.net/?p=25137" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25137#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25137" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Cell Architectures (adding dashes of heterogeneity)</title>
    <summary xml:lang="en">Cell Architectures From the post: A consequence of Service Oriented Architectures is the burning need to provide services at scale. The architecture that has evolved to satisfy these requirements is a little known technique called the Cell Architecture. A Cell Architecture is based on the idea that massive scale requires parallelization and parallelization requires components [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://highscalability.com/blog/2012/5/9/cell-architectures.html">Cell Architectures</a></p>
<p>From the post:</p>
<blockquote><p>A consequence of Service Oriented Architectures is the burning need to provide services at scale. The architecture that has evolved to satisfy these requirements is a little known technique called the Cell Architecture.</p>
<p>A Cell Architecture is based on the idea that massive scale requires parallelization and parallelization requires components be isolated from each other. These islands of isolation are called cells. A cell is a self-contained installation that can satisfy all the operations for a <a href="http://highscalability.com/unorthodox-approach-database-design-coming-shard">shard</a>. A shard is a subset of a much larger dataset, typically a range of users, for example. </p>
<p>Cell Architectures have several advantages:</p>
<ul>
<li>Cells provide a unit of parallelization that can be adjusted to any size as the user base grows.</li>
<li>Cell are added in an incremental fashion as more capacity is required.</li>
<li>Cells isolate failures. One cell failure does not impact other cells.</li>
<li>Cells provide isolation as the storage and application horsepower to process requests is independent of other cells.</li>
<li>Cells enable nice capabilities like the ability to test upgrades, implement rolling upgrades, and test different versions of software.</li>
<li>Cells can fail, be upgraded, and distributed across datacenters independent of other cells.</li>
</ul>
</blockquote>
<p>The intersection of semantic heterogeneity and scaling remains largely unexplored. </p>
<p>I suggest scaling in a homogeneous environment and then adding dashes of heterogeneity to see what breaks. </p>
<p>Adjust and try again.</p></div>
    </content>
    <updated>2012-05-12T21:01:31Z</updated>
    <published>2012-05-12T21:01:31Z</published>
    <category scheme="http://tm.durusau.net" term="Cell Architecture"/>
    <category scheme="http://tm.durusau.net" term="Heterogeneous Data"/>
    <category scheme="http://tm.durusau.net" term="Heterogeneous Programming"/>
    <category scheme="http://tm.durusau.net" term="Parallelism"/>
    <category scheme="http://tm.durusau.net" term="Scalability"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-14T03:10:14Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25134</id>
    <link href="http://tm.durusau.net/?p=25134" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25134#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25134" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Outlier detection in two review articles (Part 1)</title>
    <summary xml:lang="en">Outlier detection in two review articles (Part 1) by Sandro Saitta. Sandro writes: The first one, Outlier Detection: A Survey, is written by Chandola, Banerjee and Kumar. They define outlier detection as the problem of “[...] finding patterns in data that do not conform to expected normal behavior“. After an introduction to what outliers are, [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www.dataminingblog.com/outlier-detection-in-two-review-articles-part-1/">Outlier detection in two review articles (Part 1)</a> by Sandro Saitta.</p>
<p>Sandro writes:</p>
<blockquote><p>The first one, <em>Outlier Detection: A Survey</em>, is written by Chandola, Banerjee and Kumar. They define outlier detection as the problem of “[...] finding patterns in data that do not conform to expected normal behavior“. After an introduction to what outliers are, authors present current challenges in this field. In my experience, non-availability of labeled data is a major one.</p>
<p>…</p>
<p>One of their main conclusions is that “<em>[...] outlier detection is not a well-formulated problem</em>“. It is your job, as a data miner, to formulate it correctly.</p></blockquote>
<p>The final quote seems particularly well suited to subject identity issues. While any one subject identity may be well defined, the question is how to find and manage other subject identifications that may not be well defined. </p>
<p>As Sandro points out, it has nineteen (19) pages of references. However, only nine of those are as recent at 2007. All the rest are older references. I am sure it remains an excellent reference source but suspect more recent review articles on outlier detection exist.</p>
<p>Suggestions?  </p></div>
    </content>
    <updated>2012-05-12T20:38:49Z</updated>
    <published>2012-05-12T20:38:49Z</published>
    <category scheme="http://tm.durusau.net" term="Data Mining"/>
    <category scheme="http://tm.durusau.net" term="Outlier Detection"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-14T00:54:39Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25131</id>
    <link href="http://tm.durusau.net/?p=25131" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25131#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25131" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">CDH3 update 4 is now available</title>
    <summary xml:lang="en">CDH3 update 4 is now available by David Wang. From the post: We are happy to officially announce the general availability of CDH3 update 4. This update consists primarily of reliability enhancements as well as a number of minor improvements. First, there have been a few notable HBase updates. In this release, we’ve upgraded Apache [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www.cloudera.com/blog/2012/05/cdh3-update-4-is-now-available/">CDH3 update 4 is now available</a> by David Wang.</p>
<p>From the post:</p>
<blockquote><p>We are happy to officially announce the general availability of CDH3 update 4. This update consists primarily of reliability enhancements as well as a number of minor improvements.</p>
<p>First, there have been a few notable HBase updates. In this release, we’ve upgraded Apache HBase to upstream version 0.90.6, improving system robustness and availability. Also, some of the recent hbck changes were incorporated to better detect and handle various types of corruptions. Lastly, HDFS append support is now disabled by default in this release as it is no longer needed for HBase. Please see <a href="https://ccp.cloudera.com/display/CDHDOC/Known+Issues+and+Work+Arounds+in+CDH3" target="_blank" title="CDH3 update 4 Known Issues and Workarounds">the CDH3 Known Issues and Workarounds page</a> for details.</p>
<p>In addition to the HBase updates, CDH3 update 4 also includes the latest release of Apache Flume (incubating) – version 1.1.0. A detailed description of what it brings to the table is found <a href="http://www.cloudera.com/blog/2011/12/apache-flume-architecture-of-flume-ng-2/" target="_blank" title="Flume NG blog post">in a previous Cloudera blog post describing its architecture</a>. Please note that we will continue to ship Flume 0.9.4 as well.</p>
</blockquote></div>
    </content>
    <updated>2012-05-12T20:24:31Z</updated>
    <published>2012-05-12T20:24:31Z</published>
    <category scheme="http://tm.durusau.net" term="Flume"/>
    <category scheme="http://tm.durusau.net" term="HBase"/>
    <category scheme="http://tm.durusau.net" term="Hadoop"/>
    <category scheme="http://tm.durusau.net" term="MapReduce"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-14T00:54:39Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25127</id>
    <link href="http://tm.durusau.net/?p=25127" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25127#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25127" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Meta-tools for exploring explanations</title>
    <summary xml:lang="en">Meta-tools for exploring explanations Jon Udell writes: At the Canadian University Software Engineering Conference in January, Bret Victor gave a brilliant presentation that continues to resonate in the technical community. No programmer could fail to be inspired by Bret’s vision, which he compellingly demonstrated, of a system that makes software abstractions visual, concrete, and directly [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://blog.jonudell.net/2012/05/08/meta-tools-for-exploring-explanations/">Meta-tools for exploring explanations</a></p>
<p>Jon Udell writes:</p>
<blockquote><p>At the Canadian University Software Engineering Conference in January, <a href="http://worrydream.com">Bret Victor</a> gave a <a href="http://vimeo.com/36579366">brilliant presentation</a> that continues to resonate in the technical community. No programmer could fail to be inspired by Bret’s vision, which he compellingly demonstrated, of a system that makes software abstractions visual, concrete, and directly manipulable. Among the inspired were Eric Maupin and Chris Granger, both of whom quickly came up with their own implementations — in <a href="http://ermau.com/making-instant-c-viable-part-1/">C#</a> and <a href="http://www.chris-granger.com/2012/02/26/connecting-to-your-creation/">ClojureScript</a> — of the ideas Bret Victor had fleshed out in JavaScript.</p>
</blockquote>
<p>Here is an example of the sort of problem Jon thinks we can address:</p>
<blockquote><p>We need robust explorable explanations that state assumptions, link to supporting data, and assemble context that enables us to cross-check assumptions and evaluate consequences.</p>
<p>And we need them everywhere, for everything. Consider, for example, the current debate about fracking. We’re having this conversation because, as Daniel Yergin explains in The Quest, a natural gas revolution has gotten underway pretty recently. There’s a lot of more of it available than was thought, particularly in North America, and we can recover it and burn it a lot more cleanly than the coal that generates so much of our electric power. Are there tradeoffs? Of course, There are always tradeoffs. What cripples us is our inability to evaluate them. We isolate every issue, and then polarize it. Economist Ed Dolan writes</p>
<blockquote><p>These anti-frackers have a simple solution: ban it.</p>
<p>    The pro-frackers, too, have a simple solution: get the government out of the way and drill baby, drill.</p>
<p>    The environmental impacts of fracking are a real problem, but one to which neither prohibition nor laissez faire seems a sensible solution. Instead, we should look toward mitigation of impacts using economic tools that have been applied successfully in the case of other environmental harms.
</p></blockquote>
<p>In order to do that, we’ve got to be able to put people in both camps in front of an explorable explanation with a slider that varies how much natural gas we choose to produce, linked to other sliders that vary what we pay, in dollars, lives, and environmental impact, not only for fracking but also for coal production and use, for Middle East wars, and so on.</p></blockquote>
<p>Whatever your position on mapping discussions and dialogues, you will find this an interesting essay.</p>
<p>Jon points to other resources by Bret Victor:</p>
<p><a href="http://worrydream.com/ExplorableExplanations/">Explorable Explanations</a> (essay)</p>
<p><a href="http://worrydream.com/TenBrighterIdeas/">Ten Brighter Ideas</a> (demo for Explorable Explanations)</p>
<p><a href="http://worrydream.com/MagicInk/">MagicInk</a> (book length essay, 2006)</p></div>
    </content>
    <updated>2012-05-12T20:16:46Z</updated>
    <published>2012-05-12T20:16:46Z</published>
    <category scheme="http://tm.durusau.net" term="Graphics"/>
    <category scheme="http://tm.durusau.net" term="Visualization"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-13T23:48:53Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25114</id>
    <link href="http://tm.durusau.net/?p=25114" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25114#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25114" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Data journalism handbook: Tips for Working with Numbers in the News</title>
    <summary xml:lang="en">Michael Blastland writes in Data journalism handbook: Tips for Working with Numbers in the News some short tips that will ease you towards becoming a data journalist. You might want to print out Michael’s tips and keep them close at hand. After a while you may want to add your own tips about particular data [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p>Michael Blastland writes in <a href="http://www.guardian.co.uk/news/datablog/2012/apr/28/data-journalism-handbook-tips">Data journalism handbook: Tips for Working with Numbers in the News</a> some short tips that will ease you towards becoming a data journalist. </p>
<p>You might want to print out Michael’s tips and keep them close at hand. </p>
<p>After a while you may want to add your own tips about particular data sources. </p>
<p>Or better yet, share them with others!</p>
<p>Oh, btw, the <a href="http://datajournalismhandbook.org/">Data Journalism Handbook</a>.</p></div>
    </content>
    <updated>2012-05-11T23:40:12Z</updated>
    <published>2012-05-11T23:40:12Z</published>
    <category scheme="http://tm.durusau.net" term="Data"/>
    <category scheme="http://tm.durusau.net" term="News"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-13T17:35:53Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25122</id>
    <link href="http://tm.durusau.net/?p=25122" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25122#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25122" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Clustering by hypergraphs and dimensionality of cluster systems</title>
    <summary xml:lang="en">Clustering by hypergraphs and dimensionality of cluster systems by S. Albeverio and S.V. Kozyrev. Abstract: In the present paper we discuss the clustering procedure in the case where instead of a single metric we have a family of metrics. In this case we can obtain a partially ordered graph of clusters which is not necessarily [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://arxiv.org/abs/1204.5952">Clustering by hypergraphs and dimensionality of cluster systems</a> by S. Albeverio and S.V. Kozyrev.</p>
<p>Abstract:</p>
<blockquote><p>In the present paper we discuss the clustering procedure in the case where instead of a single metric we have a family of metrics. In this case we can obtain a partially ordered graph of clusters which is not necessarily a tree. We discuss a structure of a hypergraph above this graph. We propose two definitions of dimension for hyperedges of this hypergraph and show that for the multidimensional p-adic case both dimensions are reduced to the number of p-adic parameters.</p>
<p>We discuss the application of the hypergraph clustering procedure to the construction of phylogenetic graphs in biology. In this case the dimension of a hyperedge will describe the number of sources of genetic diversity. </p></blockquote>
<p>A pleasant reminder that hypergraphs and hyperedges are simplifications of the complexity we find in nature.</p>
<p>If hypergraphs/hyperedges are simplifications, what would you call a graph/edges?</p>
<p>A simplification of a simplification? </p>
<p>Graphs are useful sometimes.</p>
<p>Useful sometimes doesn’t mean useful at all times. </p></div>
    </content>
    <updated>2012-05-11T23:39:27Z</updated>
    <published>2012-05-11T23:39:27Z</published>
    <category scheme="http://tm.durusau.net" term="Clustering"/>
    <category scheme="http://tm.durusau.net" term="Graphs"/>
    <category scheme="http://tm.durusau.net" term="Hyperedges"/>
    <category scheme="http://tm.durusau.net" term="Hypergraphs"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-13T17:35:53Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25119</id>
    <link href="http://tm.durusau.net/?p=25119" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25119#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25119" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Picard and Dathon at El-Adrel</title>
    <summary xml:lang="en">Orri Erling’s account of the seeing Bryan Thompson reminded me of Picard and Dathon at El-Adrel, albeit with happier results. See what you think: I gave an invited talk (“Virtuoso 7 – Column Store and Adaptive Techniques for Graph” (Slides (ppt))) at the Graph Data Management Workshop at ICDE 2012. Bryan Thompson of Systap (Bigdata® [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www.openlinksw.com/weblog/oerling/?id=1705">Orri Erling’s account of the seeing Bryan Thompson</a> reminded me of <a href="http://en.wikipedia.org/wiki/Darmok">Picard and Dathon at El-Adrel</a>, albeit with happier results. </p>
<p>See what you think:</p>
<blockquote><p>
I gave an invited talk (“Virtuoso 7 – Column Store and Adaptive Techniques for Graph” (<a href="http://www.cse.unsw.edu.au/~iwgdm/2012/Slides/Virtuoso.ppt">Slides (ppt)</a>)) at the <a href="http://www.cse.unsw.edu.au/~iwgdm/2012/">Graph Data Management Workshop</a> at <a href="http://www.icde12.org/Site/">ICDE 2012</a>.</p>
<p>Bryan Thompson of <a href="http://www.systap.com/">Systap</a> (<a href="http://www.systap.com/bigdata.htm">Bigdata®</a> RDF store) was also invited, so we got to talk about our common interests. He told me about two cool things they have recently done, namely introducing tables to <a href="http://dbpedia.org/resource/SPARQL">SPARQL</a>, and adding a way of <a href="http://dbpedia.org/resource/Reification_%28computer_science%29">reifying statements</a> that does not rely on extra columns. The table business is just about being able to store a multicolumn result set into a named persistent entity for subsequent processing. But this amounts to a SQL table, so the relational model has been re-arrived at, once more, from practical considerations. The reification just packs all the fields of a triple (or quad) into a single string and this string is then used as an RDF S or O (Subject or Object), less frequently a P or G (Predicate or Graph). This works because Bigdata® has variable length fields in all columns of the triple/quad table. The query notation then accepts a function-looking thing in a triple pattern to mark reification. Nice. Virtuoso has a variable length column in only the O but could of course have one in also S and even in P and G. The column store would still compress the same as long as reified values did not occur. These values on the other hand would be unlikely to compress very well but run length and dictionary would always work.</p>
<p>So, we could do it like Bigdata®, or we could add a “quad ID” column to one of the indices, to give a reification ID to quads. Again no penalty in a column store, if you do not access the column. Or we could make an extra table of PSOG-&gt;R.</p>
<p>Yet another variation would be to make the SPOG concatenation a literal that is interned in the RDF literal table, and then used as any literal would be in the O, and as an IRI in a special range when occurring as S. The relative merits depend on how often something will be reified and on whether one wishes to SELECT based on parts of reification. Whichever the case may be, the idea of a function-looking placeholder for a reification is a nice one and we should make a compatible syntax if we do special provenance/reification support. The model in the RDF reification vocabulary is a non-starter and a thing to discredit the sem web for anyone from database.
</p></blockquote>
<p>Pushing past the metaphors it sounds like both Orri and Bryan are working on interesting projects. <img alt=";-)" class="wp-smiley" src="http://tm.durusau.net/wp-includes/images/smilies/icon_wink.gif"/> </p></div>
    </content>
    <updated>2012-05-11T21:50:20Z</updated>
    <published>2012-05-11T21:50:20Z</published>
    <category scheme="http://tm.durusau.net" term="Graphs"/>
    <category scheme="http://tm.durusau.net" term="SQL"/>
    <category scheme="http://tm.durusau.net" term="bigdata&#xAE;"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-12T23:22:50Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25116</id>
    <link href="http://tm.durusau.net/?p=25116" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25116#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25116" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Nuts and Bolts of Data Mining: Correlation &amp; Scatter Plots</title>
    <summary xml:lang="en">Nuts and Bolts of Data Mining: Correlation &amp; Scatter Plots by Tim Graettinger. From the post: In this article, I continue the “Nuts and Bolts of Data Mining” series. We will tackle two, intertwined tools/topics this time: correlation and scatter plots. These tools are fundamental for gauging the relationship (if any) between pairs of data [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www.discoverycorpsinc.com/correlation-and-scatter-plots/">Nuts and Bolts of Data Mining: Correlation &amp; Scatter Plots</a> by Tim Graettinger.</p>
<p>From the post:</p>
<blockquote><p>In this article, I continue the “Nuts and Bolts of Data Mining” series.  We will tackle two, intertwined tools/topics this time: correlation and scatter plots.  These tools are fundamental for gauging the relationship (if any) between pairs of data elements.  For instance, you might want to view the relationship between the age and income of your customers as a scatter plot.  Or, you might compute a number that is the correlation between these two customer demographics.  As we’ll soon see, there are good, bad, and ugly things that can happen when you apply a purely computational method like correlation.  My goal is to help you avoid the usual pitfalls, so that you can use correlation and scatter plots effectively in your own work.</p></blockquote>
<p>You will smile at the examples but if the popular press is any indication, correlation is no laughing matter!</p>
<p>Tim’s post won’t turn the tide but short enough to forward to the local broadside folks.  </p></div>
    </content>
    <updated>2012-05-11T21:19:41Z</updated>
    <published>2012-05-11T21:19:41Z</published>
    <category scheme="http://tm.durusau.net" term="Correlation"/>
    <category scheme="http://tm.durusau.net" term="Statistics"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-12T21:35:38Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25108</id>
    <link href="http://tm.durusau.net/?p=25108" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25108#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25108" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Who Do You Say You Are?</title>
    <summary xml:lang="en">In Data Governance in Context, Jim Ericson outlines several paths of data governance, or as I put it: Who Do You Say You Are?: On one path, more enterprises are dead serious about creating and using data they can trust and verify. It’s a simple equation. Data that isn’t properly owned and operated can’t be [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p>In <a href="http://www.information-management.com/blogs/data-governance-ungoverned-data-data-management-data-discovery-10022389-1.html">Data Governance in Context</a>, Jim Ericson outlines several paths of data governance, or as I put it: Who Do You Say You Are?:</p>
<blockquote><p>On one path, more enterprises are dead serious about creating and using data they can trust and verify. It’s a simple equation. Data that isn’t properly owned and operated can’t be used for regulatory work, won’t be trusted to make significant business decisions and will never have the value organizations keep wanting to ascribe it on the balance sheet. We now know instinctively that with correct and thorough information, we can jump on opportunities, unite our understanding and steer the business better than before.</p>
<p>On a similar path, we embrace tested data in the marketplace (see Experian, D&amp;B, etc.) that is trusted for a use case even if it does not conform to internal standards. Nothing wrong with that either.</p>
<p>And on yet another path (and areas between) it’s exploration and discovery of data that might engage huge general samples of data with imprecise value.</p>
<p>It’s clear that we cannot and won’t have the same governance standards for all the different data now facing an enterprise.</p>
<p>For starters, crowd sourced and third party data bring a new dimension, because “fitness for purpose” is by definition a relative term. You don’t need or want the same standard for how many thousands or millions of visitors used a website feature or clicked on a bundle in the way you maintain your customer or financial info.</p></blockquote>
<p>Do mortgage-backed securities fall into the “…huge general samples of data with imprecise value?” I ask because I don’t work in the financial industry. Or do they not practice data governance, except to generate numbers for the auditors? </p>
<p>I mention this because I suspect that subject identity governance would be equally useful for topic map authoring. </p>
<p>For some topic maps, say on drug trials, need to have a high degree of reliability and auditability. As well as precise identification (even if double-blind) of the test subjects. </p>
<p>Or there may be different tests for subject identity, some of which appear to be less precise than others.</p>
<p>For example, merging all the topics entered by a particular operator in a day to look for patterns that may indicate they are not following data entry protocols. (It is hard to be as random as real data.)</p>
<p>As with most issues, there isn’t any hard and fast rule that works for all cases. You do need to document the rules you are following and for how long. It will help you test old rules and to formulate new ones. (“Document” meaning to write down. The vagaries of memory are insufficient.)</p></div>
    </content>
    <updated>2012-05-11T20:55:08Z</updated>
    <published>2012-05-11T20:55:08Z</published>
    <category scheme="http://tm.durusau.net" term="Data Governance"/>
    <category scheme="http://tm.durusau.net" term="Identification"/>
    <category scheme="http://tm.durusau.net" term="Identity"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-12T21:16:01Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25103</id>
    <link href="http://tm.durusau.net/?p=25103" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25103#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25103" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Evaluating the Design of the R Language</title>
    <summary xml:lang="en">Evaluating the Design of the R Language Sean McDirmid writes: From our recent discussion on R, I thought this paper deserved its own post (ECOOP final version) by Floreal Morandat, Brandon Hill, Leo Osvald, and Jan Vitek; abstract: R is a dynamic language for statistical computing that combines lazy functional features and object-oriented programming. This [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://lambda-the-ultimate.org/node/4507">Evaluating the Design of the R Language</a></p>
<p>Sean McDirmid writes:</p>
<blockquote><p>From our <a href="http://lambda-the-ultimate.org/node/4503">recent discussion</a> on R, I thought <a href="http://www.cs.purdue.edu/homes/jv/pubs/ecoop12.pdf">this paper</a> deserved its own post (ECOOP final version) by Floreal Morandat, Brandon Hill, Leo Osvald, and Jan Vitek; abstract:</p>
<blockquote><p>R is a dynamic language for statistical computing that combines lazy functional features and object-oriented programming. This rather unlikely linguistic cocktail would probably never have been prepared by computer scientists, yet the language has become surprisingly popular. With millions of lines of R code available in repositories, we have an opportunity to evaluate the fundamental choices underlying the R language design. Using a combination of static and dynamic program analysis we assess the success of different language features.
</p></blockquote>
<p>Excerpts from the paper:</p>
<blockquote><p>R comes equipped with a rather unlikely mix of features. In a nutshell, R is a dynamic language in the spirit of Scheme or JavaScript, but where the basic data type is the vector. It is functional in that functions are ﬁrst-class values and arguments are passed by deep copy. Moreover, R uses lazy evaluation by default for all arguments, thus it has a pure functional core. Yet R does not optimize recursion, and instead encourages vectorized operations. Functions are lexically scoped and their local variables can be updated, allowing for an imperative programming style. R targets statistical computing, thus missing value support permeates all operations.</p>
<p>    One of our discoveries while working out the semantics was how eager evaluation of promises turns out to be. The semantics captures this with C[]; the only cases where promises are not evaluated is in the arguments of a function call and when promises occur in a nested function body, all other references to promises are evaluated. In particular, it was surprising and unnecessary to force assignments as this hampers building inﬁnite structures. Many basic functions that are lazy in Haskell, for example, are strict in R, including data type constructors. As for sharing, the semantics cleary demonstrates that R prevents sharing by performing copies at assignments. </p>
<p>    The R implementation uses copy-on-write to reduce the number of copies. With superassignment, environments can be used as shared mutable data structures. The way assignment into vectors preserves the pass-by-value semantics is rather unusual and, from personal experience, it is unclear if programmers understand the feature. … It is noteworthy that objects are mutable within a function (since ﬁelds are attributes), but are copied when passed as an argument.</p></blockquote>
</blockquote>
<p>Perhaps not immediately applicable to a topic map task today but I would argue very relevant for topic maps in general.</p>
<p>In part because it is a reminder that we are fashioning, when writing topic maps or topic map interfaces or languages to be used with topic maps, languages. Languages that will or perhaps will not fit how our users view the world and how they tend to formulate queries or statements.</p>
<p>The test for an artificial language should be whether users have to stop to consider the correctness of their writing. Every pause is a sign that error may be about to occur. Will they remember that this is an SVO language? Or is the terminology a familiar one? </p>
<p>Correcting the errors of others may “validate” your self-worth but is that what you want as the purpose of your language?</p>
<p>(I saw this at  <a href="http://aliquote.org/memos/">Christophe Lalanne’s</a> blog.)</p></div>
    </content>
    <updated>2012-05-11T20:33:07Z</updated>
    <published>2012-05-11T20:33:07Z</published>
    <category scheme="http://tm.durusau.net" term="Language"/>
    <category scheme="http://tm.durusau.net" term="Language Design"/>
    <category scheme="http://tm.durusau.net" term="R"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-12T20:16:46Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25100</id>
    <link href="http://tm.durusau.net/?p=25100" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25100#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25100" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Crowdsourcing – A Solution to your “Bad Data” Problems</title>
    <summary xml:lang="en">Crowdsourcing – A Solution to your “Bad Data” Problems by Hollis Tibbetts. Hollis writes: Data problems – whether they be inaccurate data, incomplete data, data categorization issues, duplicate data, data in need of enrichment – are age-old. IT executives consistently agree that data quality/data consistency is one of the biggest roadblocks to them getting full [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www.ebizq.net/blogs/integrationedge/2012/01/crowdsourcing---a-solution-to-your-bad-data-problems.php">Crowdsourcing – A Solution to your “Bad Data” Problems</a> by Hollis Tibbetts.</p>
<p>Hollis writes:</p>
<blockquote><p>Data problems – whether they be inaccurate data, incomplete data, data categorization issues, duplicate data, data in need of enrichment – are age-old.</p>
<p>IT executives consistently agree that data quality/data consistency is one of the biggest roadblocks to them getting full value from their data. Especially in today’s information-driven businesses, this issue is more critical than ever.</p>
<p>Technology, however, has not done much to help us solve the problem – in fact, technology has resulted in the increasingly fast creation of mountains of “bad data”, while doing very little to help organizations deal with the problem.</p>
<p>One “technology” holds much promise in helping organizations mitigate this issue – crowdsourcing. I put the word technology in quotation marks – as it’s really people that solve the problem, but it’s an underlying technology layer that makes it accurate, scalable, distributed, connectable, elastic and fast. In an article earlier this week, I referred to it as “Crowd Computing”.</p>
<p><strong>Crowd Computing – for Data Problems</strong></p>
<p>The Human “Crowd Computing” model is an ideal approach for newly entered data that needs to either be validated or enriched in near-realtime, or for existing data that needs to be cleansed, validated, de-duplicated and enriched. Typical data issues where this model is applicable include:</p>
<ul>
<li>Verification of correctness</li>
<li>Data conflict and resolution between different data sources</li>
<li>Judgment calls (such as determining relevance, format or general “moderation”)</li>
<li>“Fuzzy” referential integrity judgment</li>
<li>Data error corrections</li>
<li>Data enrichment or enhancement</li>
<li>Classification of data based on attributes into categories</li>
<li>De-duplication of data items</li>
<li>Sentiment analysis</li>
<li>Data merging</li>
<li>Image data – correctness, appropriateness, appeal, quality</li>
<li>Transcription (e.g. hand-written comments, scanned content)</li>
<li>Translation</li>
</ul>
<p>In areas such as the Data Warehouse, Master Data Management or Customer Data Management, Marketing databases, catalogs, sales force automation data, inventory data – this approach is ideal – or any time that business data needs to be enriched as part of a business process.
</p></blockquote>
<p>Hollis has a number of good points. But the choice doesn’t have to be “big data/iron” versus “crowd computing.” </p>
<p>More likely to get useful results out of some combination of the two. </p>
<p>Make “big data/iron” responsible for raw access, processing, visualization in an interactive environment with semantics supplied by the “crowd computers.” </p>
<p>And vet participants on both sides in real time. Would be a novel thing to have firms competing to supply the interactive environment and being paid on the basis of the “crowd computers” that preferred it or got better results. </p>
<p>That is a ways past where Hollis is going but I think it leads naturally in that direction.</p></div>
    </content>
    <updated>2012-05-11T20:11:22Z</updated>
    <published>2012-05-11T20:11:22Z</published>
    <category scheme="http://tm.durusau.net" term="Crowd Sourcing"/>
    <category scheme="http://tm.durusau.net" term="Data Quality"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-12T20:16:46Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25097</id>
    <link href="http://tm.durusau.net/?p=25097" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25097#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25097" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Debategraph</title>
    <summary xml:lang="en">Debategraph I am not real sure what to make of this so I thought I would ask you! The “details” report: The objective with Debategraph is not so much an absolutism of rationality as a transparency of rationality; creating a means for people to collaboratively capture and display all of the arguments pertinent to a [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://debategraph.org/">Debategraph</a></p>
<p>I am not real sure what to make of this so I thought I would ask you!</p>
<p> <img alt=";-)" class="wp-smiley" src="http://tm.durusau.net/wp-includes/images/smilies/icon_wink.gif"/> </p>
<p>The “details” report:</p>
<blockquote><p>The objective with <em>Debategraph</em> is not so much an absolutism of rationality as a transparency of rationality; creating a means for people to collaboratively capture and display all of the arguments pertinent to a debate clearly and fairly so that all of the participants in the debate have the chance to see the debate as a whole and to understand how the positions they hold exist within that debate. </p></blockquote>
<p>I wonder about this being an interface for authoring topic maps, perhaps as using a news reader? With links and nodes being auto-populated from pre-cooked sub-graphs?</p>
<p>Suggestions?</p></div>
    </content>
    <updated>2012-05-11T19:52:53Z</updated>
    <published>2012-05-11T19:52:53Z</published>
    <category scheme="http://tm.durusau.net" term="Debate"/>
    <category scheme="http://tm.durusau.net" term="Graphs"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-12T20:16:46Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25094</id>
    <link href="http://tm.durusau.net/?p=25094" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25094#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25094" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Google Chart Tools</title>
    <summary xml:lang="en">Google Chart Tools From the introduction: Google Chart Tools provide a perfect way to visualize data on your website. From simple line charts to complex hierarchical tree maps, the chart galley provides a large number of well-designed chart types. Populating your data is easy using the provided client- and server-side tools. A chart depends on [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="https://developers.google.com/chart/">Google Chart Tools</a></p>
<p>From the introduction:</p>
<blockquote><p>Google Chart Tools provide a perfect way to visualize data on your website. From simple line charts to complex hierarchical tree maps, the chart galley provides a large number of well-designed chart types. Populating your data is easy using the provided client- and server-side tools.</p>
<p>A chart depends on the following building blocks:</p>
<p><strong>Chart Library</strong><br/>
…<br/>
<strong>Data Styles</strong><br/>
…<br/>
<strong>Data Sources</strong><br/>
…</p></blockquote>
<p>More tools for exploring data. </p>
<p>Not to mention making that analysis available to others.</p></div>
    </content>
    <updated>2012-05-11T19:19:57Z</updated>
    <published>2012-05-11T19:19:57Z</published>
    <category scheme="http://tm.durusau.net" term="Graphics"/>
    <category scheme="http://tm.durusau.net" term="Visualization"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-12T20:16:46Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25087</id>
    <link href="http://tm.durusau.net/?p=25087" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25087#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25087" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Read’em and Weep</title>
    <summary xml:lang="en">I read Progress Made and Challenges Remaining in Sharing Terrorism-Related Information today. My summary: We are less than five years away from some unknown level of functioning for an Information Sharing Environment (ISE) that facilitates the sharing of terrorism-related information. Less than 20 years after 9/11, we will have some capacity to share information that [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p>I read <a href="http://www.gao.gov/products/GAO-12-144T">Progress Made and Challenges Remaining in Sharing Terrorism-Related Information</a> today.</p>
<p>My summary: We are less than five years away from some unknown level of functioning for an Information Sharing Environment (ISE) that facilitates the sharing of terrorism-related information. </p>
<p>Less than 20 years after 9/11, we will have some capacity to share information that may enable the potential disruption of terrorist plots.</p>
<p>The patience of terrorists and their organizations is appreciated. (I added that part. The report doesn’t say that.)</p>
<p>The official summary.</p>
<blockquote><p>A breakdown in information sharing was a major factor contributing to the failure to prevent the September 11, 2001, terrorist attacks. Since then, federal, state, and local governments have taken steps to improve sharing. This statement focuses on government efforts to (1) establish the Information Sharing Environment (ISE), a government-wide approach that facilitates the sharing of terrorism-related information; (2) support fusion centers, where states collaborate with federal agencies to improve sharing; (3) provide other support to state and local agencies to enhance sharing; and (4) strengthen use of the terrorist watchlist. GAO’s comments are based on products issued from September 2010 through July 2011 and selected updates in September 2011. For the updates, GAO reviewed reports on the status of Department of Homeland Security (DHS) efforts to support fusion centers, and interviewed DHS officials regarding these efforts. This statement also includes preliminary observations based on GAO’s ongoing watchlist work. For this work, GAO is analyzing the guidance used by agencies to nominate individuals to the watchlist and agency procedures for screening individuals against the list, and is interviewing relevant officials from law enforcement and intelligence agencies, among other things..</p>
<p>The government continues to make progress in sharing terrorism-related information among its many security partners, but does not yet have a fully-functioning ISE in place. In prior reports, GAO recommended that agencies take steps to develop an overall plan or roadmap to guide ISE implementation and establish measures to help gauge progress. These measures would help determine what information sharing capabilities have been accomplished and are left to develop, as well as what difference these capabilities have made to improve sharing and homeland security. Accomplishing these steps, as well as ensuring agencies have the necessary resources and leadership commitment, should help strengthen sharing and address issues GAO has identified that make information sharing a high-risk area. Federal agencies are helping fusion centers build analytical and operational capabilities, but have more work to complete to help these centers sustain their operations and measure their homeland security value. For example, DHS has provided resources, including personnel and grant funding, to develop a national network of centers. However, centers are concerned about their ability to sustain and expand their operations over the long term, negatively impacting their ability to function as part of the network. Federal agencies have provided guidance to centers and plan to conduct annual assessments of centers’ capabilities and develop performance metrics by the end of 2011 to determine centers’ value to the ISE. DHS and the Department of Justice are providing technical assistance and training to help centers develop privacy and civil liberties policies and protections, but continuous assessment and monitoring policy implementation will be important to help ensure the policies provide effective protections. In response to its mission to share information with state and local partners, DHS’s Office of Intelligence and Analysis (I&amp;A) has taken steps to identify these partner’s information needs, develop related intelligence products, and obtain more feedback on its products. I&amp;A also provides a number of services to its state and local partners that were generally well received by the state and local officials we contacted. However, I&amp;A has not yet defined how it plans to meet its state and local mission by identifying and documenting the specific programs and activities that are most important for executing this mission. The office also has not developed performance measures that would allow I&amp;A to demonstrate the expected outcomes and effectiveness of state and local programs and activities. In December 2010, GAO recommended that I&amp;A address these issues, which could help it make resource decisions and provide accountability over its efforts. GAO’s preliminary observations indicate that federal agencies have made progress in implementing corrective actions to address problems in watchlist-related processes that were exposed by the December 25, 2009, attempted airline bombing. These actions are intended to address problems in the way agencies share and use information to nominate individuals to the watchlist, and use the list to prevent persons of concern from boarding planes to the United States or entering the country, among other things. These actions can also have impacts on agency resources and the public, such as traveler delays and other inconvenience. GAO plans to report the results of this work later this year. GAO is not making new recommendations, but has made recommendations in prior reports to federal agencies to enhance information sharing. The agencies generally agreed and are making progress, but full implementation of these recommendations is needed.</p></blockquote>
<p>Full Report: <a href="http://www.gao.gov/assets/590/585711.pdf">Progress Made and Challenges Remaining in Sharing Terrorism-Related Information</a></p>
<p>Let me share with you the other GAO reports cited in this report:</p>
<ul>
<li><a href="http://www.gao.gov/products/GAO-11-881">Department of Homeland Security: Progress Made and Work Remaining in Implementing Homeland Security Missions 10 Years after 9/11.</a> GAO-11-881. Washington, D.C: September 7, 2011.</li>
<li><a href="http://www.gao.gov/products/GAO-11-455">Information Sharing Environment: Better Road Map Needed to Guide Implementation and Investments.</a> GAO-11-455. Washington, D.C: July 21, 2011.</li>
<li><a href="http://www.gao.gov/products/GAO-11-278">High-Risk Series: An Update.</a> GAO-11-278. Washington, D.C.: February 2011.</li>
<li><a href="http://www.gao.gov/products/GAO-11-223">Information Sharing: DHS Could Better Define How It Plans to Meet Its State and Local Mission and Improve Performance Accountability.</a> GAO-11-223. Washington, D.C.: December 16, 2010.</li>
<li><a href="http://www.gao.gov/products/GAO-10-972">Information Sharing: Federal Agencies Are Helping Fusion Centers Build and Sustain Capabilities and Protect Privacy, but Could Better Measure Results.</a> GAO-10-972. Washington, D.C.: September 29, 2010.</li>
<li><a href="http://www.gao.gov/products/GAO-10-703T">Terrorist Watchlist Screening: FBI Has Enhanced Its Use of Information from Firearm and Explosives Background Checks to Support Counterterrorism Efforts.</a> GAO-10-703T. Washington, D.C.: May 5, 2010.</li>
<li><a href="http://www.gao.gov/products/GAO-10-401T">Homeland Security: Better Use of Terrorist Watchlist Information and Improvements in Deployment of Passenger Screening Checkpoint Technologies Could Further Strengthen Security.</a> GAO-10-401T. Washington, D.C.: January 27, 2010.</li>
<li><a href="http://www.gao.gov/products/GAO-10-41">Information Sharing: Federal Agencies Are Sharing Border and Terrorism Information with Local and Tribal Law Enforcement Agencies, but Additional Efforts Are Needed.</a> GAO-10-41. Washington, D.C.: December 18, 2009.</li>
<li><a href="http://www.gao.gov/products/GAO-08-492">Information Sharing Environment: Definition of the Results to Be Achieved in Improving Terrorism-Related Information Sharing Is Needed to Guide Implementation and Assess Progress.</a> GAO-08-492. Washington, D.C.: June 25, 2008.</li>
<li><a href="http://www.gao.gov/products/GAO-08-35">Homeland Security: Federal Efforts Are Helping to Alleviate Some Challenges Encountered by State and Local Information Fusion Centers.</a> GAO-08-35. Washington, D.C.: October 30, 2007.</li>
<li><a href="http://www.gao.gov/products/GAO-06-1031">Terrorist Watch List Screening: Efforts to Help Reduce Adverse Effects on the Public.</a> GAO-06-1031. Washington, D.C.: September 29, 2006.</li>
<li><a href="http://www.gao.gov/products/GAO-06-385">Information Sharing: The Federal Government Needs to Establish Policies and Processes for Sharing Terrorism-Related and Sensitive but Unclassified Information.</a> GAO-06-385. Washington, D.C.: March 17, 2006.</li>
</ul>
<p>Do you see semantic mapping opportunities in all those reports? </p></div>
    </content>
    <updated>2012-05-11T19:14:09Z</updated>
    <published>2012-05-11T19:14:09Z</published>
    <category scheme="http://tm.durusau.net" term="Government"/>
    <category scheme="http://tm.durusau.net" term="Government Data"/>
    <category scheme="http://tm.durusau.net" term="Intelligence"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-11T23:40:12Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25084</id>
    <link href="http://tm.durusau.net/?p=25084" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25084#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25084" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Learn Hadoop and get a paper published</title>
    <summary xml:lang="en">Learn Hadoop and get a paper published by Allison Domicone. From the post: We’re looking for students who want to try out the Hadoop platform and get a technical report published. (If you’re looking for inspiration, we have some paper ideas below. Keep reading.) Hadoop’s version of MapReduce will undoubtedbly come in handy in your [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://commoncrawl.org/learn-hadoop-and-get-a-paper-published/">Learn Hadoop and get a paper published</a> by Allison Domicone.</p>
<p>From the post:</p>
<blockquote><p>We’re looking for students who want to try out the Hadoop platform and get a technical report published.</p>
<p>(If you’re looking for inspiration, we have some  paper ideas below. Keep reading.)</p>
<p>Hadoop’s version of MapReduce will undoubtedbly come in handy in your future research, and Hadoop is a fun platform to get to know. Common Crawl, a nonprofit organization with a mission to build and maintain an open crawl of the web that is accessible to everyone, has a huge repository of open data – about 5 billion web pages – and documentation to help you learn these tools.</p>
<p>So why not knock out a quick technical report on Hadoop and Common Crawl? Every grad student could use an extra item in the Publications section of his or her CV.</p>
<p>As an added bonus, you would be helping us out. We’re trying to encourage researchers to use the Common Crawl corpus. Your technical report could inspire others and provide a citable papers for them to reference.</p>
<p>Leave a comment now if you’re interested! Then once you’ve talked with your advisor, follow up to your comment, and we’ll be available to help point you in the right direction technically.</p></blockquote>
<p>How very cool!</p>
<p>Hurry, there are nineteen (19) comments already!</p></div>
    </content>
    <updated>2012-05-10T23:47:11Z</updated>
    <published>2012-05-10T23:47:11Z</published>
    <category scheme="http://tm.durusau.net" term="Common Crawl"/>
    <category scheme="http://tm.durusau.net" term="Hadoop"/>
    <category scheme="http://tm.durusau.net" term="MapReduce"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-11T21:50:20Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25081</id>
    <link href="http://tm.durusau.net/?p=25081" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25081#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25081" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Many Eyes</title>
    <summary xml:lang="en">Many Eyes I haven’t reproduced all the hyperlinks but if you go to Tour you will find: The heart of the site is a collection of data visualizations. You may want to begin by browsing through these collections—if you’d rather explore than read directions, take a look! On Many Eyes you can: 1. View and [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www-958.ibm.com/software/data/cognos/manyeyes/">Many Eyes</a></p>
<p>I haven’t reproduced all the hyperlinks but if you go to <a href="http://www-958.ibm.com/software/data/cognos/manyeyes/page/Tour.html">Tour</a> you will find:</p>
<blockquote><p>The heart of the site is a collection of data visualizations. You may want to begin by browsing through these collections—if you’d rather explore than read directions, take a look!</p>
<p>On Many Eyes you can:</p>
<p>1. View and discuss visualizations<br/>
2. View and discuss data sets<br/>
3. Create visualizations from existing data sets</p>
<p>If you register, you can also:</p>
<p>4. Rate data sets and visualizations<br/>
5. Upload your own data<br/>
6. Create and participate in topic centers<br/>
7. Select items to watch<br/>
8. Track your contributions, watchlist, and topic centers<br/>
9. See comments that others have written to you </p></blockquote>
<p>From the website:</p>
<blockquote><p><strong>An experiment brought to you by IBM Research and the IBM Cognos software group.</strong></p></blockquote>
<p>Another step closer to data analysis being limited only by your imagination and not access to data or tools. </p>
<p>Well worth an extended visit. </p></div>
    </content>
    <updated>2012-05-10T23:06:34Z</updated>
    <published>2012-05-10T23:06:34Z</published>
    <category scheme="http://tm.durusau.net" term="Graphics"/>
    <category scheme="http://tm.durusau.net" term="Visualization"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-11T21:50:20Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25078</id>
    <link href="http://tm.durusau.net/?p=25078" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25078#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25078" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Visual Complexity</title>
    <summary xml:lang="en">Visual Complexity Described by Manuel Lima (its creator) as: VisualComplexity.com intends to be a unified resource space for anyone interested in the visualization of complex networks. The project’s main goal is to leverage a critical understanding of different visualization methods, across a series of disciplines, as diverse as Biology, Social Networks or the World Wide [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www.visualcomplexity.com/vc/">Visual Complexity</a></p>
<p>Described by Manuel Lima (its creator) as:</p>
<blockquote><p>VisualComplexity.com intends to be a unified resource space for anyone interested in the visualization of complex networks. The project’s main goal is to leverage a critical understanding of different visualization methods, across a series of disciplines, as diverse as Biology, Social Networks or the World Wide Web. I truly hope this space can inspire, motivate and enlighten any person doing research on this field.</p>
<p>Not all projects shown here are genuine complex networks, in the sense that they aren’t necessarily at the <a href="http://en.wikipedia.org/wiki/Edge_of_chaos">edge of chaos</a>, or show an irregular and systematic degree of connectivity. However, the projects that apparently skip this class were chosen for two important reasons. They either provide advancement in terms of visual depiction techniques/methods or show conceptual uniqueness and originality in the choice of a subject. Nevertheless, all projects have one trait in common: the whole is always more than the sum of its parts.</p></blockquote>
<p>The homepage is simply stunning.</p>
<p>BTW, Manuel is also the author of: <a href="http://www.visualcomplexity.com/vc/book/">Visual Complexity: Mapping Patterns of Information</a>.</p></div>
    </content>
    <updated>2012-05-10T22:54:21Z</updated>
    <published>2012-05-10T22:54:21Z</published>
    <category scheme="http://tm.durusau.net" term="Complex Networks"/>
    <category scheme="http://tm.durusau.net" term="Graphics"/>
    <category scheme="http://tm.durusau.net" term="Social Networks"/>
    <category scheme="http://tm.durusau.net" term="Visualization"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-11T20:55:08Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25074</id>
    <link href="http://tm.durusau.net/?p=25074" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25074#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25074" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">EveryBlock</title>
    <summary xml:lang="en">EveryBlock I remember my childhood neighborhood just before the advent of air conditioning and the omnipresence of TV. A walk down the block gave you a good idea of what your neighbors were up to. Or not. Comparing then to now, the neighborhood where I now live, is strangely silent. Walk down my block and [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www.everyblock.com/">EveryBlock</a></p>
<p>I remember my childhood neighborhood just before the advent of air conditioning and the omnipresence of TV. A walk down the block gave you a good idea of what your neighbors were up to. Or not. <img alt=";-)" class="wp-smiley" src="http://tm.durusau.net/wp-includes/images/smilies/icon_wink.gif"/> </p>
<p>Comparing then to now, the neighborhood where I now live, is strangely silent. Walk down my block and you hear no TVs, conversations, radios, loud discussions or the like.</p>
<p>We have become increasingly isolated from others by our means of transportation, entertainment and climate control. </p>
<p><a href="http://www.everyblock.com/">EveryBlock</a> offers the promise of restoring some of the random contact with our neighbors to our lives. </p>
<p>EveryBlock says it solves two problems:</p>
<blockquote><p>First, there’s no good place to keep track of everything happening in your neighborhood, from news coverage to events to photography. We try to collect all of the news and civic goings-on that have happened recently in your city, and make it simple for you to keep track of news in particular areas.</p>
<p>Second, there’s no good way to post messages to your neighbors online. Facebook lets you post messages to your friends, Twitter lets you post messages to your followers, but no well-used service lets you post a message to people in a given neighborhood.</p></blockquote>
<p>EveryBlock addresses the problem of geographic blocks, but how do you get information on your professional block?</p>
<p>Do you hear anything unexpected or different? Or do you hear the customary and expected? </p>
<p>Maybe your professional block has gotten too silent.</p>
<p>Suggestions for how to change that? </p></div>
    </content>
    <updated>2012-05-10T22:43:00Z</updated>
    <published>2012-05-10T22:43:00Z</published>
    <category scheme="http://tm.durusau.net" term="Social Media"/>
    <category scheme="http://tm.durusau.net" term="Social Networks"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-11T20:55:08Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25069</id>
    <link href="http://tm.durusau.net/?p=25069" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25069#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25069" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Simple federated queries with RDF [Part 1]</title>
    <summary xml:lang="en">Simple federated queries with RDF [Part 1] Bob DuCharme writes: A few more triples to identify some relationships, and you’re all set. [side note] Easy aggregation without conversion is where semantic web technology shines the brightest. Once, at an XML Summer School session, I was giving a talk about semantic web technology to a group [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www.snee.com/bobdc.blog/2012/04/simple-federated-queries-with.html">Simple federated queries with RDF [Part 1]</a></p>
<p>Bob DuCharme writes:</p>
<blockquote><p><strong>A few more triples to identify some relationships, and you’re all set.</strong></p>
<p>[side note] <strong>Easy aggregation without conversion is where semantic web technology shines the brightest.</strong></p>
<p>Once, at an <a href="http://xmlsummerschool.com/">XML Summer School</a> session, I was giving a talk about semantic web technology to a group that included several presenters from other sessions. This included <a href="http://www.ltg.ed.ac.uk/~ht/">Henry Thompson</a>, who I’ve known since the SGML days. He was still a bit skeptical about RDF, and said that RDF was in the same situation as XML—that if he and I stored similar information using different vocabularies, we’d still have to convert his to use the same vocabulary as mine or vice versa before we could use our data together. I told him he was wrong—that easy aggregation without conversion is where semantic web technology shines the brightest.</p>
<p>I’ve finally put together an example. Let’s say that I want to query across his address book and my address book together for the first name, last name, and email address of anyone whose email address ends with “.org”. Imagine that his address book uses the <a href="http://www.w3.org/TR/vcard-rdf/">vCard</a> vocabulary and the Turtle syntax and looks like this,</p></blockquote>
<p>Bob is an expert in more areas of markup, SGML/XML, SPARQL and other areas than I can easily count. Not to mention being a good friend. </p>
<p>Take a look at Bob’s post and decide for yourself how “simple” the federation is following Bob’s technique.</p>
<p>I am just going to let it speak for itself today.</p>
<p>I will outline obvious and some not so obvious steps in Bob’s “simple” federated queries in Part II.</p></div>
    </content>
    <updated>2012-05-10T21:12:40Z</updated>
    <published>2012-05-10T21:12:40Z</published>
    <category scheme="http://tm.durusau.net" term="Federation"/>
    <category scheme="http://tm.durusau.net" term="RDF"/>
    <category scheme="http://tm.durusau.net" term="SPARQL"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-11T20:11:22Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://tm.durusau.net/?p=25063</id>
    <link href="http://tm.durusau.net/?p=25063" rel="alternate" type="text/html"/>
    <link href="http://tm.durusau.net/?p=25063#comments" rel="replies" type="text/html"/>
    <link href="http://tm.durusau.net/?feed=atom&amp;p=25063" rel="replies" type="application/atom+xml"/>
    <title xml:lang="en">Workshops Semantic knowledge solutions</title>
    <summary xml:lang="en">Workshops Semantic knowledge solutions by Fiemke Griffioen. From the post: Morpheus is organizing a number of one-day workshops Semantic knowledge solutions about how knowledge applications can be developed within your organization. We show what the advantages are of gaining insight into your knowledge and sharing knowledge. In the workshops our Kamala webapplication is used to [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://en.mssm.nl/2012/05/10/workshops-semantic-knowledge-solutions/">Workshops Semantic knowledge solutions</a> by Fiemke Griffioen.</p>
<p>From the post:</p>
<blockquote><p>Morpheus is organizing a number of one-day workshops Semantic knowledge solutions about how knowledge applications can be developed within your organization. We show what the advantages are of gaining insight into your knowledge and sharing knowledge.</p>
<p>In the workshops our <a href="http://en.mssm.nl/software/kamala-in-the-cloud/">Kamala webapplication</a> is used to model knowledge. Kamala is a web application for efficiently developing and sharing semantic knowledge and is based on the open source Topic Maps-engine <a href="http://en.mssm.nl/software/ontopia/">Ontopia</a>. Kamala is similar to the editor of Ontopia, Ontopoly, but more interactive and flexible because users require less knowledge of the Topic Maps data model in advance.</p></blockquote>
<p>Since I haven’t covered Kamala before:</p>
<blockquote><p>Kamala includes the following features:</p>
<ul>
<li>Availability of the complete data model of Topic Maps standard</li>
<li>Navigation based on ontological structures</li>
<li>Search topics based on naming</li>
<li>Sharing topic maps with other users (optionally read-only)</li>
<li>Importing and exporting topic maps to the standard formats XTM, TMXML, LTM, CXTM, etc.</li>
<li>Querying topic maps with the TOLOG or TMQL query languages</li>
<li>Storing queries for simple repetition of the query</li>
<li>Validation of topic maps, so that ‘gaps’ in the knowledge model can be traced</li>
<li>Generating statistics</li>
</ul>
<p>The following modules are available to expand Kamala’s core functionality:</p>
<ul>
<li>Geo-module, so topics with a geotag can be placed on a map</li>
<li>Facet indexation for effective navigation based on classification</li>
</ul>
</blockquote>
<p>The workshops are on <a href="http://en.mssm.nl/about-morpheus/contact/">Landgoed Maarsbergen</a> (That’s what I said, so I included the contact link, which has a map.)</p></div>
    </content>
    <updated>2012-05-10T20:28:11Z</updated>
    <published>2012-05-10T20:28:11Z</published>
    <category scheme="http://tm.durusau.net" term="Kamala"/>
    <category scheme="http://tm.durusau.net" term="Ontopia"/>
    <category scheme="http://tm.durusau.net" term="Ontopoly"/>
    <category scheme="http://tm.durusau.net" term="Topic Map Software"/>
    <author>
      <name>Patrick Durusau</name>
    </author>
    <source>
      <id>http://tm.durusau.net/?feed=atom</id>
      <link href="http://tm.durusau.net" rel="alternate" type="text/html"/>
      <link href="http://tm.durusau.net/?feed=atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en">Patrick Durusau on Topic Maps and Semantic Diversity</subtitle>
      <title xml:lang="en">Another Word For It</title>
      <updated>2012-05-11T20:11:22Z</updated>
    </source>
  </entry>
</feed>

