Truffle Pig: Knwl.JS Finds Data Snippets Automatically
What if important information, such as time or location information, email addresses, phone numbers, links and other data snippets, are hidden in plain text? If you’d want to mine these valuable fragments a lot of manual work would be necessary. Wait. Not anymore. The JavaScript library Knwl.JS can automatically find this information, filter it and make it available for further use. With some creativity, very flexible solutions are possible. Usage is not complicated, so let’s give this a spin.
Knwl.JS: Plugins for The Recognition of Different Content
To kick things off, Knwl.js needs to be implemented into the HTML head first. Afterwards, you can search any text passage for particular content. To do that, the text is assigned to the method KnwlInstance.init()
either directly or as a variable. Afterwards, you need to decide on a plugin that searches the text for certain patterns. One of the plugins is date
which looks for – well – date information.
KnwlInstant.init("Today is December 23rd 2015."); var output = KnwlInstance.get("date"); |
In this example, the plugin date
is accessed via KnwlInstance.get()
. It digs through the previously transferred character string, searching for date information and returns all results in JSON format.
var output = [ { "year": 2015, "month": 12, "day": 23, "preview": "Today is December 23rd 2015.","found": 2 } ] |
The JSON character string contains different values depending on the plugin. When searching for a date, the year, month and day are returned in an itemised form. Additionally, the sentence the respective value was found in is transferred via preview
by all plugins. Via found
you’ll mine the information in what spot of the text the information was found in.
When more information is found, Knwl.js displays it as individual JSON objects.
Date, Time and Location Information Only in English
Knwl.js only recognises date and time information when this info is available in English. At least for now, other languages are not supported. The same applies for the place
plugin, which recognises country names in texts.
var output = [ { "place": "Germany", "preview": "This is Germany.","found": 2 } ] |
Recognising phone numbers in different languages poses a similarly difficult problem. Here, only the English spelling is supported.
Links and Email Addresses Possible in any Language
Although only the English language is supported, it is still possible to use Kwnl.js on texts in other languages – at least concerning links and email addresses.
var output = [ { "link": "http://www.drweb.de/", "preview": "At the German site http://www.drweb.de/ you can find daily news.","found": 1 } ] |
Important when searching for links is, that the respective protocol – „HTTP://“, „HTTPS://“ or „FTP://“ – is given. Email addresses are also recognised reliably.
Develop Your Plugin
When you want to support the recognition of the time and location information in other languages, you will need to get your hands dirty and develop a custom plugin for Knwl.js. In the library’s documentation, there is an extra section on that topic. Each plugin is deposited as its own JavaScript file.
This way, you can build plugins relatively quickly. Of course, it is not only possible to support other languages. You can also develop plugins that e.g. search for metric units, currencies or colours in a text.
Some experimental plugins can be found along with the Knwl.js documentation.
Conclusion
Knwl.js offers many ways of filtering structured data from texts. While adjustments need to be made when trying to use it on texts in other languages than English, it allows you to create flexible solutions when you approach it with a bit of fantasy.
Demo to try it out
Besides the documentation, there is also a demo in which you can enter any desired text and have Knwl.js dig through it.
(dpe)