Articles

Build Systems

May 9 2017 by oxnz

Autotools

configure.ac

AC_INIT[hello], [1.0], [bug@xxx.org])
AM_INIT_AUTOMAKE([-Wall -Werror foreign])
AM_PROG_AR
AC_PROG_RANLIB
AC_PROG_CC
AC_PROG_CXX
AC_CONFIG_FILES([
	Makefile
	echo/Makefile
	])
AC_OUTPUT

Makefile.am

bin_PROGRAMS = hello
SUBDIRS=echo
hello_SOURCES = hello.cpp
hello_LDADD = ./echo/echo.a
hello_LDFLAGS = -lstdc++ -lpthread
AM_CPPFLAGS = -Iinclude
AM_CXXFLAGS = -std=c++11

autoreconf --install
./configure
make

CMake

CMakeLists.txt

add_executable(test, test.cc)

cmake .
make

TensorFlow

January 22 2017 by oxnz

TensorFlow
Query Intent Detection

January 16 2017 by oxnz

Introduction

现代搜索引擎已经不再是简单搜索文档了。为了满足用户的检索需求，需要给出精确的答案。因此，也就需要对用户的query有更深的理解。 (Identifying the intent behind the query) 识别query之后的搜索意图是达成这个目标的一个关键步骤。这不仅仅能帮助从语义上丰富搜索结果，也能帮助搜索结果排序（例如垂直搜索引擎）

意图识别是富有挑战的工作。因为query往往较短，(identifying the exact intent of the user) 识别用户query的精确意图需出了关键字之外，还要更多的上下文信息。而且，意图的种类往往是非常多的。很多的意图识别方式需要大量的人工努力（human effort）来面对这些挑战，要不就是通过为每个意图类定义模式（defining patterns for each intent class），要不就是定义特征并执行数据分析模型（by defining discriminative features for queries to run statistical models）。相反的，这里提出一种统计学方法，可以为query自动提取判别特征。
Linear Algebra

January 13 2017 by oxnz

Linear Algebra

Matrix multiply vector is Linear combination of matrix columns.
TO DO LIST - 2017

January 1 2017 by oxnz

Math
- Single Variable Calculus (18.01)
- Differential Equation (18.03)
  - ODE (Ordinary Differential Equation)
- Linear Algebra (18.06)
NLP
- 同义词 (synonym)
- 文本相关性
- relevance beyond text
- machine learning
  - neural networks and deep learning
Systems and Frameworks
- Elasticssearch
- Kibana
- Logstash
- Redis
- IK 分词器
Reading

Books
- Lucene In Action
- Information Retrieval
Source Code
- Elasticsearch
- Lucene
- Logstash
- Kibana
Relevance Beyond Text

December 30 2016 by oxnz

Points

Text-based ranking measures are ncessary but not sufficient for high quality retrieval. Extremely important to confirm intuition with experiments.
- Prefer multiplicative boosting to additive boosting
- Apply a boost function based on some static document attribute
- DocumentRank (e.g. quality, length, etc.) like PageRank
PostgreSQL

December 30 2016 by oxnz

Full Text Search
- Stemming
- Ranking / Boost
- Multi-language support
- Fuzzy search (misspell)
- Accent support
Elasticsearch - Relevancy

December 30 2016 by oxnz

之所以相关性比较困难，是因为搜索是一个严重的信息不对等的用户交互过程，所有的交互基本就限定在一个搜索框中，用户提供的搜索词也就寥寥几个，而搜索的数据往往是海量的，包括各种各样的类型和质量，用户的预期却是返回相关性非递增的搜索结果排序展示。

其中为了增加搜索的准确性，可以使用一些上下文信息来帮助搜索引擎。例如用户搜索时候所在的页面，用户的偏好设置（语言，地理位置），以及累计的用户历史记录等等。

还有一些特定领域的方法，例如具有时效性的新闻，媒体类。

搜索结果的相关性不仅为用户提供了方便，还在潜移默化的影响着用户，甚至可以起到引导用户的作用，比如推荐内容。

Table of Contents
- Table of Contents
Logstash

December 26 2016 by oxnz

Logstash performance and configration
Information Retrieve - Similarity

December 20 2016 by oxnz

Table of Contents
TF/IDF

BM25

Articles

May 9 2017 by oxnz

Autotools

CMake

January 22 2017 by oxnz

January 16 2017 by oxnz

Introduction

January 13 2017 by oxnz

January 1 2017 by oxnz

Math

NLP

Systems and Frameworks

Reading

Books

Source Code

December 30 2016 by oxnz

Points

December 30 2016 by oxnz

Full Text Search

December 30 2016 by oxnz

Table of Contents

December 26 2016 by oxnz

December 20 2016 by oxnz

Table of Contents

TF/IDF

BM25