What is Natural Language Processing - other than a language assistant addressing me?
Natural language processing (NLP) is often mentioned together with one related buzz word and one brand name which have their overlaps but cannot be used as synonyms. The related buzz word is “AI” (Artificial Intelligence), often used when people talk about “Machine Learning”. Machine Learning is a statistical approach to “learn” patterns based on so-called training data. It is true that statistical approaches are widely used in NLP, yet it is not its only mainstay.
by Sarah Holschneider
Brand names, jump out at us when we think of “language” and “computer”: “Alexa”, “Google home” or “Siri” – all language wizards use NLP. But the rapidly growing field of applications for Natural Language processes can do much more.
As an intersection between linguistics and computer science, Natural Language Processing investigates the interaction between computers and natural languages, i.e. human languages as opposed to programming languages.
Since the late 1980s and mid-1990s, much natural language processing research has relied heavily on machine learning (which might be the reason why so many people only see that aspect). Machine Learning algorithms are able to learn rules automatically through the analysis of large corpora (huge collections of texts) of typical examples.
What looks like an easy way to automize and solve text-related issues at first sight, might need many pre-processing steps by people with a certain level of linguistic knowledge. In some cases, native speakers can already help (sufficient motivation provided). In other cases you need a deeper knowledge of linguistic dependencies and computational linguists.
Systems based on machine-learning algorithms have many advantages over manually written rules. They offer possibilities of automatic learning and, as they focus on the most common cases, they do not get entangled in exceptions. However, in order to let statistics do its magic, you need a sufficiently big data set. Rule-based systems are often less scalable but can close the gap if you must work with small amounts of data.
Depending on your case, a rule-based system might work better if you have a fixed set of cases which need to be solved. Imagine you want to translate boilerplate text on a PowerPoint template that your company is using. A simple string.replace() might do the trick and investing in a statistical approach for machine translation might be way more time as well as cost-intensive.
»Defining your individual use case as specific as possible is more than half the battle.«
You realize, some of these use cases already sound like grammar class. Therefore we include linguists during the whole development of NLP-related projects at L-One Systems. As you would not code a calculator without knowing about algebra, you should not code NLP-applications without a deep understanding of language.