Welcome to Shanshan Blog!

Sql Error Tips

2017-01-06
layout: post title: “SQL Errors Summary” category: SQL tags: SQL —

mysql Error

Error Code: 1046. No database selected Select the default DB to be used by double-clicking its name in the SCHEMAS list in the sidebar

You have to tell sql which database you are going to use.
```
show databases;
USE mysql;
```
Read All

Python Pandas Handles Big Data

2017-01-05

Python

Python block for reading the big data and handling it!

import pandas as pd
import time
import numpy as np

start = time.clock()
reader = pd.read_csv('data.csv', iterator = True, low_memory = False)

loop = True
chunkSize = 5000000
chunks = []
while loop:
  try:
    chunk = reader.get_chunk(chunkSize)
    chunks.append(chunk)
  except StopIteration:
    loop = False
    print "Iteration is stopped."
df = pd.concat(chunks, ignore_index=True)
print time.clock() - start

Read All

Principal Component Analysis

2017-01-03

DataMining

When is Principal Component Analysis needed?
What is PCA?
Limitation

When is Principal Component Analysis needed?

too many features

most of features are correlated

poor accuracy with too many data

What is PCA?

PCA is a method of extracting import variables from a larget set data in dataset. It extracts low dimensional set of features from a high dimensional data with a motive to capture as much information as possible.

A principal component is a normalized linear combination of the original predictors in a data set.

Let’s say we have a set of predictors as X¹, X²…,Xp

The principal component can be written as:

	Z¹ = Φ¹¹X¹ + Φ²¹X² + Φ³¹X³ + .... +Φp¹Xp

where,

	Z¹ is first principal component
	
	Φp¹ is the loading vector comprising of loadings (Φ¹, Φ²..) of first principal component. The loadings are constrained to a sum of square equals to 1. This is because large magnitude of loadings may lead to large variance. It also defines the direction of the principal component (Z¹) along which data varies the most. It results in a line in p dimensional space which is closest to the n observations. Closeness is measured using average squared euclidean distance.
	
	X¹..Xp are normalized predictors. Normalized predictors have mean equals to zero and standard deviation equals to one.

Limitation

As noted above, the results of PCA depend on the scaling of the variables. A scale-invariant form of PCA has been developed.

Codes in the next blog

Read All

19/38

Welcome to Shanshan Blog!

Sql Error Tips

SQL Chinese in string

Python Pandas Handles Big Data

Some Python Annoying Staff

Sql Update Table

Principal Component Analysis

When is Principal Component Analysis needed?

What is PCA?

Limitation