Wednesday, April 19, 2017

Having a chat with Solr using the new echo Streaming Expression

In the next release of Solr, there is a new and interesting Streaming Expression called echo.

echo is a very simple expression with the following syntax:

echo("Hello World")

If we send this to Solr, it responds with:

{ "result-set": { "docs": [ { "echo": "Hello World" }, { "EOF": true, "RESPONSE_TIME": 0 } ] } }

Solr simply echoes the text back, but maybe it feels a bit like Solr is talking to us. Like there might be someone there.

Well it turns out that this simple exchange is the first step towards a more meaningful conversation.

Let's take another step:

classify(echo("Customer service is just terrible!"),
             model(models, id="sentiment"),

Now we are echoing text to a classifier.  The classify function is pointing to a model stored in Solr that does sentiment analysis based on the text. Notice that the classify function has an analyzer field parameter. This is a Lucene/Solr analyzer used by the classify function to pull the features from the text (See this blog for more details on the classify function).

If we send this to Solr we may get a response like this:

{ "result-set": { "docs": [ { "echo": "Customer service is just terrible!",
"probability_d":0.94888 }, { "EOF": true, "RESPONSE_TIME": 0 } ] } }

The probability_d field is the probability that the text has a negative sentiment. In this case there was a 94% probability that the text was negative.

Now Solr knows something about what's being said. We can wrap other Streaming Expressions around this to take actions or begin to formulate a response.

But we really don't yet have enough information to make a very informed response.

We can take this a bit further.

Consider this expression:

select(echo("Customer service is just terrible!"),
           analyze(echo, analyzerField) as expr_s)

The expression above uses the select expression to echo the text to the analyze Stream Evaluator. The analyze Steam Evaluator applies a Lucene/Solr analyzer to the text and returns a token stream. But in this case it returns a single token which is a Streaming Expression. 

(See this blog for more details on the analyze Stream Evaluator)

In order to make this work you would define the final step of the analyzer chain as a token filter that builds a Streaming Expression based on the natural language parsing done earlier in the analyzer chain.

Now we can wrap this construct in the new eval expression:

eval(select(echo("Customer service is just terrible!"),
                  analyze(echo, analyzerField) as expr_s))

The eval expression will compile and run the Streaming Expression created by the analyzer.  It will also emit the tuples that are emitted by the compiled expression. The tuples emitted are the response to the natural language request.

The heavy lifting is done in the analysis chain which performs the NLP and generates the Streaming Expression response.

Streaming Expressions as an AI Language

Before Streaming Expressions existed Dennis Gove shared an email with me with his initial design for the Streaming Expression syntax. The initial syntax used Lisp like S-Expressions. I took one look at the S-Expressions and realized we were building an AI language. I'll get into more detail about how this syntax ties into AI shortly, but first a little more history on Streaming Expressions.

The S-Expressions were replaced with the more familiar function syntax that Streaming Expressions has today. This decision was made by Dennis and Steven Bower. It turned out to be the right call because we now have a more familiar syntax than Lisp but we also kept many of Lisps most important qualities.

Dennis contributed the Streaming Expression parser and I began looking for something interesting to do with it. The very first thing I tried to do with Streaming Expressions was to re-write SQL queries as Streaming Expressions for the Parallel SQL interface. For this project a SQL parser was used to parse the queries and then a simple planner was built that generated Streaming Expressions to implement the physical query plan.

This was an important proving ground for Streaming Expressions for a number of reasons. It proved that Streaming Expressions could provide the functionality needed to implement the SQL query plans. It proved that Streaming Expressions could push functionality down into the search engine and also rise above the search engine using MapReduce when needed.

Most importantly from an AI standpoint it proved that we could easily generate Streaming Expressions programmatically. This was one of the key features that made Lisp a useful AI Language. The reason that Streaming Expressions are so easily generated is that the syntax is extremely regular. There are only nested functions. And because Streaming Expressions have an underlying Java object representation, we didn't have to do any String manipulation. We could work directly with the Object tree structure to build the expressions.

Why is code generation important for AI? One of the reasons is shown earlier in this blog. A core AI use case is to respond to natural language requests. One approach to doing this is to analyze the text request and then generate code to implement a response. In many ways it's similar to the problem of translating SQL to a physical query plan.

In a more general sense code generation is important in AI because you're dealing with many unknowns so it can be difficult to code everything up front. Sometimes you may need to generate logic on the fly.

Domain Specific Languages

Lisp has the capability of adapting its syntax for specific domains through it's powerful macro feature. Streaming Expressions has this capability as well, but it does it a different way.

Each Streaming Expression is implemented in Java under the covers. Each Streaming Expression is responsible for parsing it's own parameters. This means you can have Streaming Expressions that invent their own little languages. The select expression is a perfect example of this.

The basic select expression looks like this:

select(expr, fielda as outField)

This select reads tuples from a stream and outputs fielda as outField. The Streaming Expression parser has no concept of the word "as". This is specific to the select expression and the select expression handles the parsing of "as".

The reason why this works is that under the covers Streaming Expressions see all parameters as lists that it can manipulate any way it wants.

Embedded In a Search Engine

Having an AI language embedded in a search engine is a huge advantage. It allows expressions to leverage vast amounts of information in interesting ways. The inverted index already has important statistics about the text which can be used for machine learning. Search engines have strong facilities for working with text (tokenizers, filters etc..) and in recent years they've become powerful column stores for numeric calculations. They also have mature content ingestion and parallel query frameworks.

Now there is a language that ties it all together.

New York - Coronavirus Statistics (NYTimes Data Set)

As of 2020-04-09 New York City - Cumulative Cases By Day New York City - Cumulative Deaths By Day ...