Image for post
Image for post

In last post, I demonstrated how to setup PySpark environment on an EC2 instance. In this post, I am going to briefly introduce what data pipeline is, and demonstrate in examples how to build relational data model from unstructured/semi-structured data.

Data Pipeline

Generally speaking, a data pipeline is “a set of data processing elements connected in series, where the output of one element is the input of the next one”.

Image for post
Image for post

In terms of architecture, there are two types of data pipelines, batch data pipeline and stream data pipeline. …


Image for post
Image for post

Apache Spark is a very powerful analytic engine for big data processing, and has become more and more popular in different areas of big data industry very recently. Spark includes built-in modules for streaming, SQL, machine learning and graph processing. It provides high-level APIs in Java, Scala, Python and R.

The package PySpark is a Python API for Spark. It is great for performing exploratory data analysis at scale, building machine learning pipelines, creating ETL pipelines for data platforms, and many more. …


Timing a function is one way to evaluate its performance. This post demonstrates how to use decorator to time a function.

Timing a function Directly

It is very straightforward, if we want to time a function directly. The code snippet below demonstrate how to do it.

import time
start = time.time()
result = function_to_be_timed()
end = time.time()
print(f'function_to_be_timed() takes {end-start} seconds.'

Timing a function Using Decorator

With decorator, the code for timing can be reused. Below is an example implementation of a timer decorator.

def timed(fnc):
from functools import wraps
import time
@wraps(fnc)
def wrapper(*args, **kwargs):
start = time.time()
result = fnc(*args, **kwargs)
end = time.time()
arg_str = ','.join([str(arg) for arg in args] + \
['{0}={1}'.format(k, v) for (k, v) in kwargs])
fnc_str = fnc.__name__ + '(' + arg_str + ')'
print(f'{fnc_str} takes {end - start} seconds.') …

In some practical cases, how often a specific function can be invoked may need to be controlled. This post demonstrates how this can be done in python by using Decorator.

Assume we don’t want the function to be called more than some certain number of times within a given duration. In the closure the decorator returns, we define a list for storing the time at which the function is called. For the convenience of description, we call the time at which the function is called the invocation time. When the number of invocation times exceeds the maximum we wanted to control within the given duration, the time difference between the current invocation time and the first one in the list is checked. If it is less than the given duration, the function will not be invoked. Otherwise, the current invocation time is appended to the list, and the first item in the list is removed. …


Jan. 18. 2016

QtCreator is a very convenient IDE for developing Qt GUI programs. Under Ubuntu Linux 14.04, this note demonstrates an alternative way to develop Qt GUI programs without using QtCreator in an example.

First, use any text editor, say gedit, nodepad++, etc, to edit your source code. Here for simplicity, we write everything in a single .cpp file as follows.

// main.cpp
#include <QApplication>
#include <QTreeView>
#include <QStandardItemModel>
using namespace std;int main(int argc, char* argv[])
{
QApplication a(argc, argv);
QTreeView *tree = new QTreeView;
QStandardItemModel model(3, 3);
for (int i = 0; i < 3; i++)
{
for (int j = 0; j < 3; j++)
{
QStandardItem *item = new QStandardItem(QString(“Row:%1, Column:%2”).arg(i).arg(j));
if (j == 0)
{
for (int k = 0; k < 3; k++)
{
item->appendRow(new QStandardItem(QString(“Item %1”).arg(i)));
}
model.setItem(i,j,item);
}
}
}
tree->setModel(&model);
tree->show();
return a.exec(); …

Jan. 21. 2016

Roughly speadking, build in software development is the process of “translating” source code files into executable binary code files[1]; and a build system is a collection of software tools that is used to facilitate the build process[2]. Despite the fact that different build systems have been “invented” and used for over three decades, the core algorithms used in most of them are not changed much since the first introduction of the directed acyclic graph ( DAG) by Make[3]. Popular build systems nowadays include the classical GNU Make, CMake, QMake, Ninja, Ant, Scons, and many others. …


Dec. 30. 2015

Factory method is a very important design pattern in object oriented designs[1]. The main goal of this pattern is to allow the client to create objects without having to specify the details of the objects to create. It returns one of several possible classes that share a common super class. This post is not going to discuss factory pattern in C++ or any other OOP language. Instead, a similar implementation of the pattern in the programming language C will be discussed.

Function Factory: prototype

Function factory has two slightly different prototypes:

return_type (*function_factory_name(fac_type1 fac_param1, fac_type2 fac_param2, …))(fun_type1 fun_param1, fun_type2 fun_param2…

Nov. 30. 2015

A function with variable number of arguments is called a variadic function, and also known as variable argument function [1]. Due to its flexibility in size of the list of input arguments, it is very useful especially in system programming. Its usual use cases include summing of numbers, concatenating strings, and so on. Typical examples include the printf in C programming language, execl and execlp in Unix, _execl and _execlp in Windows, and many others. This post introduces how to declare/define and use such functions.

How to declare a variadic function

Declaration of a variadic function is almost same as an “ordinary” function, except that its formal parameter list ends with an ellipsis. …


Nov. 23. 2015

In this post, after introducing two preliminary concepts, I explain what virtual inheritance is, and how it is implemented, then introduce two applications of virtual inheritance. One involves multiple inheritance, and the other involves implementation of non-inheritability. For each case, one example demonstrating how virtual inheritance works is presented.

Some Preliminaries

Before discussing virtual inheritance, it is necessary to explain two very important concepts in OOP (Object Oriented Programming) concepts, static and dynamic binding.

Roughly speaking, static binding occurs at compile time, and dynamic binding occurs at run time. In C++, the two kinds of polymorphism (see Appendix for Classification of Polymorphism), overloading and overriding are two typical examples of these two concepts. For function overloading, when an overloaded function is called, during compile time, the compiler determines which version is actually called by matching their parameter type patterns. Whereas, for function overriding, C++ implements virtual function call resolution with a vtable data structure [4]. In C++, virtual inheritance is also implemented with vtable. …

About

Chuan Zhang

Senior Software Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store