We’ve updated our Similarity Search plugin to be fully functional with WordPress’ recent 3.0 release! If you have the plugin already installed, you can just go to your Dashboard and update it through the Plugins page. To download it fresh as a new user, visit our plugin’s homepage in the WordPress Plugin Directory.
One thing we’ve been anticipating with WordPress 3.0 is the ability to run multiple sites. This is a functionality previously covered by a separate WordPress product, called WordPress Mu. Multisite is the new functionality that takes over for Mu, and it’s integrated right into the 3.0 release. With just a few tweaks, you can be up and running with multiple blogs on one WordPress installation in no time. The REALLY cool thing is that our plugin is also compatible with the new multisite mode. That means if you administer multiple blogs using a multisite instance of WordPress 3.0, you install our plugin once on the main blog, and it’s available to all of the sub-sites. Pretty neat!
As always, we love to hear your feedback, the good or the bad. Good lets us know we’re on the right track, and bad lets us know when we’re not (or if something’s broken and we didn’t find it). Have fun with the plugin, and happy blogging!
0 Comments » Posted in Uncategorized by Jay Baker
Read More »
Our API was unavailable for some users from 1:20 to 2:20 EDT today and the issues have now been fixed and access restored. We apologize if this caused any inconvenience.
Please contact us if you have any questions.
0 Comments » Posted in Uncategorized by BarryRubinson
Read More »
For those of you using WordPress and the TextWise WordPress plugin, the long anticipated version 3.0 of WordPress has been released. We’ve been testing our plugin using release candidate versions to get a head start on fixing issues, and we’re working to get a new release of the plugin out that’s compatible with WordPress 3.0. However, we know that our current plugin WILL work with 3.0 with one very minor tweak, which you can do yourself to continue using it until we release the fully functional replacement. WordPress 3.0 contains a change that breaks our Tag functionality. All you have to do to make all of the functionality of the plugin, minus Tags, work again is the following:
1. Log into your Admin Dashboard
2. Click on Settings
3. Click on TextWise
4. Remove the check in the checkbox next to Tags in the “Select TextWise Content Suggestions” section
5. Click on Save Changes at the bottom of the page
That’s it! You will still be able to add tags manually. We’ll have the replacement plugin out to you as soon as possible. And by all means, if you discover an issue, please let us know!
0 Comments » Posted in News, qa by Jay Baker
Read More »
Has this ever happened to you? You are Googling for information on the Web, but inadvertently your query happens to share keywords with the latest cultural phenom: the next tweener heart throb, a YouTube video suddenly gone viral, or yet another paranoid political fantasy that refuses to die.
You are a professional, however, and so switch into Advanced Mode to reshape your query, but to no avail. Your information has been buried under pop detritus; it has been hijacked by the maximum likelihood estimate (MLE) on the Web.
At times like this, you want to grab your search engine by the neck and shout, “I am NOT a screaming twelve-year-old girl into dancing cats and fixated on the President’s birth place!” But your search engine continues blithely in the wisdom of the crowd.
It is a reminder that statistically grounded information systems are at the mercy of their training data. If we cede too much control of a system to its finely wrought black box judgment, then we sometimes are going to run off the tracks. This is especially true with web semantics.
If we do in fact want to get under the hood to adjust a semantic system to go against the popular flow, then it helps tremendously if the categories underlying the representation of document content are intelligible to people. Such transparency is a prime motivation for how semantic dictionaries are currently built by TextWise.
Of course, if you care nary a lick about transparency, then may I interest you in this slightly used synthetic collateralized debt obligation….
1 Comment » Posted in General, Science, Semantic Search, semantics by Clinton Mah
Read More »
When people read text, they may not understand everything in it. For example, a layman might look at an article from a medical journal and see only that it is about some kind of drug. Someone more familiar with medicine would pick up that this is an experimental drug for treating estrogen-sensitive breast cancer. An expert would note that the drug is an aromatase blocker that performs as well as a standard approved drug in a double-blind controlled trials with a large sample of patients.
If an application seeks simply to distinguish documents about pharmaceuticals from documents about toxic financial assets or about the World Cup tournament in South Africa, then it is enough to understand at a superficial level. If a physician is searching for treatment options for a patient with a recurrence of breast cancer, however, a much deeper grasp of content is called for.
A general type of semantic dictionary covering a broad variety of different subjects is more or less forced to opt for broad coverage by default. Collecting enough training data for two thousand dimensions is a major undertaking; having to do it for twenty thousand dimensions will entail a big commitment of resources that one will have to justify. Still, if such a dictionary is critical for a given application, then we need to make the investment.
In many cases the domain of content to be covered can be quite circumscribed. Accordingly, we probably would be better off to add a fairly small number of dimensions to an existing semantic dictionary rather than build a whole new dictionary from scratch. This will require some special statistical balancing of course, but balancing is what dictionary building is all about.
1 Comment » Posted in General, Science, semantics by Clinton Mah
Read More »
The 2010 Semantic Technology Conference will take place from June 21 – 25 in San Francisco.
This year our Executive Director of Science, Wen Ruan, will be on the ‘Semantic Social Networking’ panel along with Amit P. Sheth, Professor & Director of Kno.e.sis, Wright State University and Jamie Taylor who oversees data operations at Metaweb Technologies.
This panel will present some approaches and tools that combine semantics with social network data, visualization of relationships, measurement and interactions, and user-generated content analysis. For more on the topics being covered, go to the schedule. The session is scheduled on June 24th from 4:45pm – 5:45pm PST.
Watch a video of last year’s panel that Wen hosted: “Semantic Search: Beyond RDF”
Hope to see you there!
0 Comments » Posted in News by Rebecca Povio
Read More »
We have been musing about the true topology of semantic spaces and how this affects our concept of dimensionality. This segués logically into a hot area of contention. In our linear approximation of meaning, how many dimensions do we really need and what should they be?
Some people prefer to approach this problem mathematically. Given a representative sample of documents to describe semantically, we can look at the relationship between terms and documents as a defining a vector space. One can then apply the method of singular vector decomposition (SVD) to find a minimal set of basis vectors to span that space. These singular vectors are like eigenvectors on steroids.
If you have actually read this far into this blog, then you will know that we (TextWise) have a competitor that employs SVD for semantic analysis. We get asked all the time why we have stuck with basic statistical techniques when we could instead be rigorously mathematical. Our usual response is that we have much faster turnaround in building semantic dictionaries, finer-grain descriptions of content, and more intuitive concepts overall.
There are more fundamental concerns, however, both theoretical and practical. On the theoretical side, SVD might be pushing a linear-space semantic model too far if meaning is in fact topological complex. More significantly on the practical side, though, is that one might be getting caught in the common problem of overtraining.
Suppose that we have a hundred thousand blog posting to which we apply SVD to get some optimal set of dimensions for analyzing their content. What then happens next week when we get a million new blogs that we have never seen before? Our perfect basis set is now distinctly handicapped.
Now we could try to reprocess all our data here, but SVD is so computationally intensive as an algorithm that it probably will be too slow to keep up without superextraordinary investments in hardware resources. We also would end up with an unstable system in which it is quite difficult to compare results from one week to the next. Anyway, we made our choice here.
0 Comments » Posted in General, Opinion, Science, semantics by Clinton Mah
Read More »
TextWise is pleased to release an updated version of our WordPress plugin which now supports WordPress version 2.9.2. After a few rigorous rounds of development and testing, then more development and testing, the plugin is now available for you to enjoy on your blogs. We’ve worked hard to maintain compatibility with all WordPress versions from 2.9.2 back to 2.6.1. We have our eye on WordPress and know that they’re releasing version 3.0 soon, as well. So we’ll be working to ensure our plugin works with that, too.
If you’re currently a user of our plugin, thanks for using it and giving us great feedback to make it even better. If you aren’t using the plugin…why not? Head on over to http://wordpress.org/extend/plugins/textwise/ and download it to enhance your WordPress blog with relevant media, tags, links, and more! Enjoy!
1 Comment » Posted in News, Update by Jay Baker
Read More »
People in the information sciences are fond of high-dimensional vector spaces as models of document content. These are in fact only approximations of reality, however; and in the specific case of semantics, they are probably an oversimplification. We already know something about how the neural circuitry in our brains work when we process the meaning of language; we can find no clean finite-dimensional linear space in the tangle of our synapses.
Neural imaging like PET does support the theory that linguistic concepts correspond to particular clusters of neurons connected in fairly complex feedback loops. Our understanding here is still quite limited, though. We do not know how many such clusters exist or how widely they are distributed. Visual concepts are in a different part of the brain than auditory concepts, for example; and overall, we have not yet found any obvious switchboard, say in the hippocampus, that could somehow tie everything together neatly.
In our computational semantic model, we assume that all concepts are independent and equal. That seems to work in semantic dictionary applications when we have thousands of concepts of concepts as dimensions, but an espistemologist here would have the lurking suspicion that our actual semantic space has to be some kind of complex manifold with all kinds of holes and twisting surfaces like a deranged n-th-order Moebius strip. Meaning is messy.
Our linear Euclidean model may therefore be valid only in a small local region of our actual semantic space, but in practice, that is really where all our apps have to live. One cannot presume to comprehend all possible content in text. We can only slice off a small piece of the pie of meaning, and until world peace and perfect enlightenment break out, that is a good start.
0 Comments » Posted in General, Opinion, Science, semantics by Clinton Mah
Read More »
We have been thinking lately about how many dimensions a semantic dictionary should have. Some researchers at Carnegie-Mellon have been approaching the same question from the perspective of neuroscience and real-time imaging of activity in the human brain while understanding language (http://bit.ly/buIZEx).
According to CMU, there are really only THREE basic semantic dimensions: (1) Can I eat it? (2) Can I pick it up? (3) Can I hide in it? Admittedly, this primitive partitioning of the world probably goes back to our primate origins, but does have a certain resonance. Let’s remember it the next time we try to categorize journal articles in nanotechnology or search postings on someone’s Facebook wall.
0 Comments » Posted in General, News, Science, semantics by Clinton Mah
Read More »