File biner di python w3schools

Alat berikut memvisualisasikan apa yang dilakukan komputer langkah demi langkah saat menjalankan program tersebut

Show

Editor Kode Python

Kontribusikan kode dan komentar Anda melalui Disqus

Sebelumnya. Rumah Latihan Pencarian dan Penyortiran Python
Lanjut. Tulis program Python untuk pencarian berurutan

Berapa tingkat kesulitan latihan ini?

Mudah Sedang Keras

Uji keterampilan Pemrograman Anda dengan kuis w3resource



Ikuti kami di Facebook dan Twitter untuk pembaruan terbaru.

Piton. Kiat Hari Ini

Dekomposisi koleksi

Asumsikan kita memiliki fungsi yang mengembalikan tuple dari dua nilai dan kita ingin menetapkan setiap nilai ke variabel terpisah. Salah satu caranya adalah dengan menggunakan pengindeksan seperti di bawah ini

abc = (5, 10)
x = abc[0]
y = abc[1]
print(x, y)

Keluaran

5 10
_

There is a better option that allows us to do the same operation in one line

x, y = abc
print(x, y)

Keluaran

5 10
_

It can be extended to a tuple with more than 2 values or some other data structures such as lists or sets

The pandas I/O API is a set of top level

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
05 functions accessed like that generally return a pandas object. The corresponding
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
07 functions are object methods that are accessed like . Below is a table containing available
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
09 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
10

Format Type

Data Description

Reader

Writer

text

CSV

text

Fixed-Width Text File

text

JSON

text

HTML

text

LaTeX

text

XML

text

Local clipboard

binary

MS Excel

binary

OpenDocument

binary

Format HDF5

binary

Format Bulu

binary

Bentuk Parket

binary

Format ORC

binary

Status

binary

SAS

binary

SPSS

binary

Format Acar Python

SQL

SQL

SQL

Google BigQuery

adalah perbandingan kinerja informal untuk beberapa metode IO ini

Catatan

Untuk contoh yang menggunakan kelas

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
_11, pastikan Anda mengimpornya dengan
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
12 untuk Python 3

File CSV & teks

Fungsi pekerja keras untuk membaca file teks (a. k. a. file datar) adalah. Lihat untuk beberapa strategi lanjutan

Opsi penguraian

menerima argumen umum berikut

Dasar

filepath_or_buffer beragam

Baik jalur ke file (a , , atau

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
17), URL (termasuk lokasi http, ftp, dan S3), atau objek apa pun dengan metode
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
18 (seperti file terbuka atau )

sep str, default ke
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
20 untuk ,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
22 untuk

Pembatas untuk digunakan. Jika sep adalah

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24, mesin C tidak dapat secara otomatis mendeteksi pemisah, tetapi mesin parsing Python dapat, artinya yang terakhir akan digunakan dan secara otomatis mendeteksi pemisah dengan alat sniffer bawaan Python,. Selain itu, pemisah yang lebih panjang dari 1 karakter dan berbeda dari
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
26 akan ditafsirkan sebagai ekspresi reguler dan juga akan memaksa penggunaan mesin parsing Python. Perhatikan bahwa pembatas regex cenderung mengabaikan data yang dikutip. Contoh regex.
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
27

delimiter str, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Alternative argument name for sep

delim_whitespace boolean, default False

Specifies whether or not whitespace (e. g.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
29 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
30) will be used as the delimiter. Equivalent to setting
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
31. If this option is set to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32, nothing should be passed in for the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
33 parameter

Column and index locations and names

header int or list of ints, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
34

Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names. if no names are passed the behavior is identical to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
35 and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
36. Explicitly pass
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
35 to be able to replace existing names

The header can be a list of ints that specify row locations for a MultiIndex on the columns e. g.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
38. Intervening rows that are not specified will be skipped (e. g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
39, so header=0 denotes the first line of data rather than the first line of the file

names array-like, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

List of column names to use. If file contains no header row, then you should explicitly pass

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
36. Duplicates in this list are not allowed

index_col int, str, sequence of int / str, or False, optional, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Column(s) to use as the row labels of the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43, either given as string name or column index. If a sequence of int / str is given, a MultiIndex is used

Catatan

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
44 can be used to force pandas to not use the first column as the index, e. g. when you have a malformed file with delimiters at the end of each line

The default value of

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24 instructs pandas to guess. If the number of fields in the column header row is equal to the number of fields in the body of the data file, then a default index is used. If it is larger, then the first columns are used as index so that the remaining number of fields in the body are equal to the number of fields in the header

The first row after the header is used to determine the number of columns, which will go into the index. If the subsequent rows contain less columns than the first row, they are filled with

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46

This can be avoided through

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47. This ensures that the columns are taken as is and the trailing data are ignored

usecols list-like or callable, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Return a subset of the columns. If list-like, all elements must either be positional (i. e. integer indices into the document columns) or strings that correspond to column names provided either by the user in

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
49 or inferred from the document header row(s). If
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
49 are given, the document header row(s) are not taken into account. For example, a valid list-like
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 parameter would be
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
52 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
53

Element order is ignored, so

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
54 is the same as
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
55. To instantiate a DataFrame from
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
56 with element order preserved use
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
57 for columns in
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
58 order or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
59 for
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
60 order

If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True

In [1]: import pandas as pd

In [2]: from io import StringIO

In [3]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [4]: pd.read_csv(StringIO(data))
Out[4]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [5]: pd.read_csv(StringIO(data), usecols=lambda x: x.upper() in ["COL1", "COL3"])
Out[5]: 
  col1  col3
0    a     1
1    a     2
2    c     3

Using this parameter results in much faster parsing time and lower memory usage when using the c engine. The Python engine loads the data first before deciding which columns to drop

squeeze boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

If the parsed data only contains one column then return a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62

Deprecated since version 1. 4. 0. Append

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
63 to the call to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
64 to squeeze the data.

prefix str, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Prefix to add to column numbers when no header, e. g. ‘X’ for X0, X1, …

Deprecated since version 1. 4. 0. Use a list comprehension on the DataFrame’s columns after calling

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66.

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1

mangle_dupe_cols boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

Duplicate columns will be specified as ‘X’, ‘X. 1’…’X. N’, rather than ‘X’…’X’. Passing in

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61 will cause data to be overwritten if there are duplicate names in the columns

Tidak digunakan lagi sejak versi 1. 5. 0. The argument was never implemented, and a new argument where the renaming pattern can be specified will be added instead.

General parsing configuration

dtype Type name or dict of column -> type, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Data type for data or columns. E. g.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
70 Use
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
15 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
72 together with suitable
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
73 settings to preserve and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion

New in version 1. 5. 0. Support for defaultdict was added. Specify a defaultdict as input where the default determines the dtype of the columns which are not explicitly listed.

engine {
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
74,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
75,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
76}

Parser engine to use. The C and pyarrow engines are faster, while the python engine is currently more feature-complete. Multithreading is currently only supported by the pyarrow engine

New in version 1. 4. 0. The “pyarrow” engine was added as an experimental engine, and some features are unsupported, or may not work correctly, with this engine.

converters dict, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Dict of functions for converting values in certain columns. Keys can either be integers or column labels

true_values list, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Values to consider as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

false_values list, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Values to consider as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

skipinitialspace boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

Skip spaces after delimiter

skiprows list-like or integer, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file

If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise

In [10]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [11]: pd.read_csv(StringIO(data))
Out[11]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [12]: pd.read_csv(StringIO(data), skiprows=lambda x: x % 2 != 0)
Out[12]: 
  col1 col2  col3
0    a    b     2

skipfooter int, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
84

Number of lines at bottom of file to skip (unsupported with engine=’c’)

nrows int, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Number of rows of file to read. Useful for reading pieces of large files

low_memory boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

Internally process the file in chunks, resulting in lower memory use while parsing, but possibly mixed type inference. To ensure no mixed types either set

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61, or specify the type with the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88 parameter. Note that the entire file is read into a single
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 regardless, use the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
90 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
91 parameter to return the data in chunks. (Only valid with C parser)

memory_map boolean, default False

If a filepath is provided for

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
92, map the file object directly onto memory and access the data directly from there. Using this option can improve performance because there is no longer any I/O overhead

NA and missing data handling

na_values scalar, str, list-like, or dict, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. See below for a list of the values interpreted as NaN by default

keep_default_na boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

Whether or not to include the default NaN values when parsing the data. Depending on whether

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
73 is passed in, the behavior is as follows

  • If

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    96 is
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    32, and
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    73 are specified,
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    73 is appended to the default NaN values used for parsing

  • If

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    96 is
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    32, and
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    73 are not specified, only the default NaN values are used for parsing

  • If

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    96 is
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    61, and
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    73 are specified, only the NaN values specified
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    73 are used for parsing

  • If

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    96 is
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    61, and
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    73 are not specified, no strings will be parsed as NaN

Note that if

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
10 is passed in as
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61, the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
96 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
73 parameters will be ignored

na_filter boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

Detect missing value markers (empty strings and the value of na_values). In data without any NAs, passing

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
15 can improve the performance of reading a large file

verbose boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

Indicate number of NA values placed in non-numeric columns

skip_blank_lines boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

If

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32, skip over blank lines rather than interpreting as NaN values

Datetime handling

parse_dates boolean atau daftar int atau nama atau daftar daftar atau dict, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61.
  • If

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    32 -> try parsing the index

  • If

    In [21]: data = "col_1\n1\n2\n'A'\n4.22"
    
    In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})
    
    In [23]: df
    Out[23]: 
      col_1
    0     1
    1     2
    2   'A'
    3  4.22
    
    In [24]: df["col_1"].apply(type).value_counts()
    Out[24]: 
    <class 'str'>    4
    Name: col_1, dtype: int64
    
    21 -> try parsing columns 1, 2, 3 each as a separate date column

  • If

    In [21]: data = "col_1\n1\n2\n'A'\n4.22"
    
    In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})
    
    In [23]: df
    Out[23]: 
      col_1
    0     1
    1     2
    2   'A'
    3  4.22
    
    In [24]: df["col_1"].apply(type).value_counts()
    Out[24]: 
    <class 'str'>    4
    Name: col_1, dtype: int64
    
    22 -> combine columns 1 and 3 and parse as a single date column

  • If

    In [21]: data = "col_1\n1\n2\n'A'\n4.22"
    
    In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})
    
    In [23]: df
    Out[23]: 
      col_1
    0     1
    1     2
    2   'A'
    3  4.22
    
    In [24]: df["col_1"].apply(type).value_counts()
    Out[24]: 
    <class 'str'>    4
    Name: col_1, dtype: int64
    
    23 -> parse columns 1, 3 as date and call result ‘foo’

Catatan

A fast-path exists for iso8601-formatted dates

infer_datetime_format boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

If

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32 and parse_dates is enabled for a column, attempt to infer the datetime format to speed up the processing

keep_date_col boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

If

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32 and parse_dates specifies combining multiple columns then keep the original columns

date_parser function, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Function to use for converting a sequence of string columns to an array of datetime instances. The default uses

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
29 to do the conversion. pandas will try to call date_parser in three different ways, advancing to the next if an exception occurs. 1) Pass one or more arrays (as defined by parse_dates) as arguments; 2) concatenate (row-wise) the string values from the columns defined by parse_dates into a single array and pass that; and 3) call date_parser once for each row using one or more strings (corresponding to the columns defined by parse_dates) as arguments

dayfirst boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

DD/MM format dates, international and European format

cache_dates boolean, default True

If True, use a cache of unique, converted dates to apply the datetime conversion. May produce significant speed-up when parsing duplicate date strings, especially ones with timezone offsets

New in version 0. 25. 0

Iteration

iterator boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

Return

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
32 object for iteration or getting chunks with
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
33

chunksize int, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Return

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
32 object for iteration. See below

Quoting, compression, and file format

compression {
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
34,
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
37,
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
38,
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
39,
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
40,
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
41,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24,
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
43}, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
34

For on-the-fly decompression of on-disk data. If ‘infer’, then use gzip, bz2, zip, xz, or zstandard if

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
92 is path-like ending in ‘. gz’, ‘. bz2’, ‘. zip’, ‘. xz’, ‘. zst’, respectively, and no decompression otherwise. If using ‘zip’, the ZIP file must contain only one data file to be read in. Set to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24 for no decompression. Can also be a dict with key
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
47 set to one of {
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
39,
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
37,
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
38,
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
41} and other key-value pairs are forwarded to
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
52,
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
53,
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
54, or
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
55. As an example, the following could be passed for faster compression and to create a reproducible gzip archive.
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
56

Changed in version 1. 1. 0. dict option extended to support

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
57 and
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
58.

Changed in version 1. 2. 0. Previous versions forwarded dict entries for ‘gzip’ to

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
59.

thousands str, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Thousands separator

decimal str, default
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
61

Character to recognize as decimal point. E. g. use

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
20 for European data

float_precision string, default None

Specifies which converter the C engine should use for floating-point values. The options are

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24 for the ordinary converter,
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
64 for the high-precision converter, and
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
65 for the round-trip converter

lineterminator str (length 1), default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Character to break file into lines. Only valid with C parser

quotechar str (length 1)

The character used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored

quoting int or
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
67 instance, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
84

Control field quoting behavior per

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
67 constants. Use one of
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
70 (0),
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
71 (1),
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
72 (2) or
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
73 (3)

doublequote boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

Ketika

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
_75 ditentukan dan
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
76 bukan
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
73, tunjukkan apakah akan menginterpretasikan dua elemen berurutan
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
75 di dalam bidang sebagai satu elemen
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
75

escapechar str (length 1), default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

One-character string used to escape delimiter when quoting is

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
73

comment str, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Indicates remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character. Like empty lines (as long as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
39), fully commented lines are ignored by the parameter
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
84 but not by
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
85. For example, if
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
86, parsing ‘#empty\na,b,c\n1,2,3’ with
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
35 will result in ‘a,b,c’ being treated as the header

encoding str, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Encoding to use for UTF when reading/writing (e. g.

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
89).

dialect str or instance, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

If provided, this parameter will override values (default or not) for the following parameters.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
33,
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
93,
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
94,
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
95,
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
75, and
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
76. If it is necessary to override values, a ParserWarning will be issued. Lihat dokumentasi untuk detail lebih lanjut

Error handling

error_bad_lines boolean, optional, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Lines with too many fields (e. g. a csv line with too many commas) will by default cause an exception to be raised, and no

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 will be returned. If
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61, then these “bad lines” will dropped from the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 that is returned. See below

Deprecated since version 1. 3. 0. The

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
03 parameter should be used instead to specify behavior upon encountering a bad line instead.

warn_bad_lines boolean, optional, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

If error_bad_lines is

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61, and warn_bad_lines is
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32, a warning for each “bad line” will be output

Deprecated since version 1. 3. 0. The

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
03 parameter should be used instead to specify behavior upon encountering a bad line instead.

on_bad_lines (‘error’, ‘warn’, ‘skip’), default ‘error’

Specifies what to do upon encountering a bad line (a line with too many fields). Allowed values are

  • ‘error’, raise an ParserError when a bad line is encountered

  • ‘warn’, print a warning when a bad line is encountered and skip that line

  • ‘skip’, skip bad lines without raising or warning when they are encountered

New in version 1. 3. 0

Specifying column data types

You can indicate the data type for the whole

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 or individual columns

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

Fortunately, pandas offers more than one way to ensure that your column(s) contain only one

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88. If you’re unfamiliar with these concepts, you can see to learn more about dtypes, and to learn more about
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
72 conversion in pandas

For instance, you can use the

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
11 argument of

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64

Or you can use the function to coerce the dtypes after reading in the data,

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64

which will convert all valid parsing to floats, leaving the invalid parsing as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46

Ultimately, how you deal with reading in columns containing mixed dtypes depends on your specific needs. In the case above, if you wanted to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46 out the data anomalies, then is probably your best option. However, if you wanted for all the data to be coerced, no matter the type, then using the
In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
11 argument of would certainly be worth trying

Catatan

In some cases, reading in abnormal data with columns containing mixed dtypes will result in an inconsistent dataset. If you rely on pandas to infer the dtypes of your columns, the parsing engine will go and infer the dtypes for different chunks of the data, rather than the whole dataset at once. Consequently, you can end up with column(s) with mixed dtypes. For example,

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')

will result with

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
19 containing an
In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
20 dtype for certain chunks of the column, and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
15 for others due to the mixed dtypes from the data that was read in. It is important to note that the overall column will be marked with a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88 of
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
72, which is used for columns with mixed dtypes

Specifying categorical dtype

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
24 columns can be parsed directly by specifying
In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
25 or
In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
26

In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object

Individual columns can be parsed as a

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
24 using a dict specification

In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object

Specifying

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
25 will result in an unordered
In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
24 whose
In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
30 are the unique values observed in the data. For more control on the categories and order, create a
In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
31 ahead of time, and pass that for that column’s
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object

When using

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
33, “unexpected” values outside of
In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
34 are treated as missing values

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
0

This matches the behavior of

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
35

Catatan

Dengan

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
25, kategori yang dihasilkan akan selalu diuraikan sebagai string (tipe objek). If the categories are numeric they can be converted using the function, or as appropriate, another converter such as

When

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88 is a
In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
31 with homogeneous
In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
30 ( all numeric, all datetimes, etc. ), konversi dilakukan secara otomatis

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
1

Naming and using columns

Handling column names

A file may or may not have a header row. pandas assumes the first row should be used as the column names

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
2

By specifying the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
49 argument in conjunction with
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
84 you can indicate other names to use and whether or not to throw away the header row (if any)

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
3

Jika tajuk berada di baris selain yang pertama, berikan nomor baris ke

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
84. This will skip the preceding rows

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
4

Catatan

Default behavior is to infer the column names. if no names are passed the behavior is identical to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
35 and column names are inferred from the first non-blank line of the file, if column names are passed explicitly then the behavior is identical to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
36

Duplicate names parsing

Tidak digunakan lagi sejak versi 1. 5. 0.

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
47 tidak pernah diterapkan, dan argumen baru di mana pola penggantian nama dapat ditentukan akan ditambahkan sebagai gantinya.

Jika file atau header berisi nama duplikat, panda secara default akan membedakannya untuk mencegah penimpaan data

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
_5

Tidak ada lagi data duplikat karena

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
48 secara default, yang mengubah serangkaian kolom duplikat 'X', ..., 'X' menjadi 'X', 'X. 1’, …, ‘X. N'

Memfilter kolom (
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47)

Argumen

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 memungkinkan Anda untuk memilih subset kolom apa pun dalam file, baik menggunakan nama kolom, nomor posisi, atau panggilan yang dapat dipanggil

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
_6

Argumen

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 juga dapat digunakan untuk menentukan kolom mana yang tidak digunakan dalam hasil akhir

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
_7

Dalam hal ini, callable menentukan bahwa kami mengecualikan kolom "a" dan "c" dari output

Komentar dan baris kosong

Mengabaikan komentar baris dan baris kosong

Jika parameter

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
_52 ditentukan, maka baris yang dikomentari sepenuhnya akan diabaikan. Secara default, baris yang benar-benar kosong juga akan diabaikan

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
_8

Jika

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
53, maka
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 tidak akan mengabaikan baris kosong

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
_9

Peringatan

Kehadiran baris yang diabaikan dapat menimbulkan ambiguitas yang melibatkan nomor baris;

In [10]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [11]: pd.read_csv(StringIO(data))
Out[11]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [12]: pd.read_csv(StringIO(data), skiprows=lambda x: x % 2 != 0)
Out[12]: 
  col1 col2  col3
0    a    b     2
_0

Jika

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
84 dan
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
85 ditentukan,
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
84 akan relatif terhadap akhir
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
85. Misalnya

In [10]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [11]: pd.read_csv(StringIO(data))
Out[11]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [12]: pd.read_csv(StringIO(data), skiprows=lambda x: x % 2 != 0)
Out[12]: 
  col1 col2  col3
0    a    b     2
_1

Komentar

Kadang-kadang komentar atau data meta dapat disertakan dalam sebuah file

In [10]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [11]: pd.read_csv(StringIO(data))
Out[11]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [12]: pd.read_csv(StringIO(data), skiprows=lambda x: x % 2 != 0)
Out[12]: 
  col1 col2  col3
0    a    b     2
_2

Secara default, parser menyertakan komentar di output

In [10]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [11]: pd.read_csv(StringIO(data))
Out[11]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [12]: pd.read_csv(StringIO(data), skiprows=lambda x: x % 2 != 0)
Out[12]: 
  col1 col2  col3
0    a    b     2
_3

Kami dapat menekan komentar menggunakan kata kunci ________193______52

In [10]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [11]: pd.read_csv(StringIO(data))
Out[11]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [12]: pd.read_csv(StringIO(data), skiprows=lambda x: x % 2 != 0)
Out[12]: 
  col1 col2  col3
0    a    b     2
_4

Berurusan dengan data Unicode

Argumen

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
62 harus digunakan untuk data unicode yang dikodekan, yang akan menghasilkan string byte yang didekodekan menjadi unicode sebagai hasilnya

In [10]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [11]: pd.read_csv(StringIO(data))
Out[11]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [12]: pd.read_csv(StringIO(data), skiprows=lambda x: x % 2 != 0)
Out[12]: 
  col1 col2  col3
0    a    b     2
_5

Beberapa format yang menyandikan semua karakter sebagai beberapa byte, seperti UTF-16, tidak akan diurai dengan benar sama sekali tanpa menentukan penyandian.

Kolom indeks dan pembatas tambahan

Jika file memiliki satu kolom data lebih banyak daripada jumlah nama kolom, kolom pertama akan digunakan sebagai nama baris

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43

In [10]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [11]: pd.read_csv(StringIO(data))
Out[11]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [12]: pd.read_csv(StringIO(data), skiprows=lambda x: x % 2 != 0)
Out[12]: 
  col1 col2  col3
0    a    b     2
_6

In [10]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [11]: pd.read_csv(StringIO(data))
Out[11]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [12]: pd.read_csv(StringIO(data), skiprows=lambda x: x % 2 != 0)
Out[12]: 
  col1 col2  col3
0    a    b     2
_7

Biasanya, Anda dapat mencapai perilaku ini menggunakan opsi

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
64

Ada beberapa kasus pengecualian saat file telah disiapkan dengan pembatas di akhir setiap baris data, membingungkan parser. Untuk secara eksplisit menonaktifkan inferensi kolom indeks dan membuang kolom terakhir, berikan

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
44

In [10]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [11]: pd.read_csv(StringIO(data))
Out[11]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [12]: pd.read_csv(StringIO(data), skiprows=lambda x: x % 2 != 0)
Out[12]: 
  col1 col2  col3
0    a    b     2
_8

Jika subset data diuraikan menggunakan opsi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47, spesifikasi
In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
64 didasarkan pada subset itu, bukan data asli

In [10]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [11]: pd.read_csv(StringIO(data))
Out[11]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [12]: pd.read_csv(StringIO(data), skiprows=lambda x: x % 2 != 0)
Out[12]: 
  col1 col2  col3
0    a    b     2
_9

Penanganan Tanggal

Menentukan kolom tanggal

Untuk lebih memudahkan bekerja dengan data datetime, gunakan argumen kata kunci

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
69 dan
In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
70 untuk memungkinkan pengguna menentukan berbagai kolom dan format tanggal/waktu untuk mengubah input data teks menjadi objek
In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
71

Kasus paling sederhana adalah dengan hanya meneruskan

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
72

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
_0

Seringkali kita ingin menyimpan data tanggal dan waktu secara terpisah, atau menyimpan berbagai bidang tanggal secara terpisah. kata kunci

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
_69 dapat digunakan untuk menentukan kombinasi kolom untuk mengurai tanggal dan/atau waktu dari

Anda dapat menentukan daftar daftar kolom ke

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
69, kolom tanggal yang dihasilkan akan ditambahkan ke output (agar tidak memengaruhi urutan kolom yang ada) dan nama kolom baru akan menjadi gabungan dari nama kolom komponen

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
_1

Secara default parser menghapus kolom tanggal komponen, tetapi Anda dapat memilih untuk mempertahankannya melalui kata kunci

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
75

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
_2

Perhatikan bahwa jika Anda ingin menggabungkan beberapa kolom menjadi satu kolom tanggal, daftar bersarang harus digunakan. Dengan kata lain,

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
_76 menunjukkan bahwa kolom kedua dan ketiga masing-masing harus diuraikan sebagai kolom tanggal terpisah sementara
In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
77 berarti dua kolom harus diuraikan menjadi satu kolom

Anda juga dapat menggunakan dict untuk menentukan kolom nama khusus

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
_3

Penting untuk diingat bahwa jika beberapa kolom teks akan diuraikan menjadi satu kolom tanggal, maka kolom baru ditambahkan ke data. Spesifikasi

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
_64 didasarkan pada kumpulan kolom baru ini daripada kolom data asli

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
_4

Catatan

Jika kolom atau indeks berisi tanggal yang tidak dapat diuraikan, seluruh kolom atau indeks akan dikembalikan tanpa diubah sebagai tipe data objek. Untuk penguraian waktu non-standar, gunakan setelah

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
80

Catatan

read_csv memiliki fast_path untuk mem-parsing string datetime dalam format iso8601, e. g “2000-01-01T00. 01. 02+00. 00” dan variasi serupa. Jika Anda dapat mengatur data Anda untuk menyimpan waktu dalam format ini, waktu muat akan jauh lebih cepat, ~20x telah diamati

Fungsi parsing tanggal

Terakhir, parser memungkinkan Anda menentukan fungsi

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
70 khusus untuk memanfaatkan sepenuhnya fleksibilitas API penguraian tanggal

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
_5

panda akan mencoba memanggil fungsi ________193______70 dengan tiga cara berbeda. Jika pengecualian dimunculkan, yang berikutnya dicoba

  1. In [25]: df2 = pd.read_csv(StringIO(data))
    
    In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")
    
    In [27]: df2
    Out[27]: 
       col_1
    0   1.00
    1   2.00
    2    NaN
    3   4.22
    
    In [28]: df2["col_1"].apply(type).value_counts()
    Out[28]: 
    <class 'float'>    4
    Name: col_1, dtype: int64
    
    _70 pertama kali dipanggil dengan satu atau lebih array sebagai argumen, sebagaimana didefinisikan menggunakan
    In [25]: df2 = pd.read_csv(StringIO(data))
    
    In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")
    
    In [27]: df2
    Out[27]: 
       col_1
    0   1.00
    1   2.00
    2    NaN
    3   4.22
    
    In [28]: df2["col_1"].apply(type).value_counts()
    Out[28]: 
    <class 'float'>    4
    Name: col_1, dtype: int64
    
    69 (e. g. ,
    In [25]: df2 = pd.read_csv(StringIO(data))
    
    In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")
    
    In [27]: df2
    Out[27]: 
       col_1
    0   1.00
    1   2.00
    2    NaN
    3   4.22
    
    In [28]: df2["col_1"].apply(type).value_counts()
    Out[28]: 
    <class 'float'>    4
    Name: col_1, dtype: int64
    
    _85)

  2. If #1 fails,

    In [25]: df2 = pd.read_csv(StringIO(data))
    
    In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")
    
    In [27]: df2
    Out[27]: 
       col_1
    0   1.00
    1   2.00
    2    NaN
    3   4.22
    
    In [28]: df2["col_1"].apply(type).value_counts()
    Out[28]: 
    <class 'float'>    4
    Name: col_1, dtype: int64
    
    70 is called with all the columns concatenated row-wise into a single array (e. g. ,
    In [25]: df2 = pd.read_csv(StringIO(data))
    
    In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")
    
    In [27]: df2
    Out[27]: 
       col_1
    0   1.00
    1   2.00
    2    NaN
    3   4.22
    
    In [28]: df2["col_1"].apply(type).value_counts()
    Out[28]: 
    <class 'float'>    4
    Name: col_1, dtype: int64
    
    87)

Note that performance-wise, you should try these methods of parsing dates in order

  1. Try to infer the format using

    In [25]: df2 = pd.read_csv(StringIO(data))
    
    In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")
    
    In [27]: df2
    Out[27]: 
       col_1
    0   1.00
    1   2.00
    2    NaN
    3   4.22
    
    In [28]: df2["col_1"].apply(type).value_counts()
    Out[28]: 
    <class 'float'>    4
    Name: col_1, dtype: int64
    
    88 (see section below)

  2. If you know the format, use

    In [25]: df2 = pd.read_csv(StringIO(data))
    
    In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")
    
    In [27]: df2
    Out[27]: 
       col_1
    0   1.00
    1   2.00
    2    NaN
    3   4.22
    
    In [28]: df2["col_1"].apply(type).value_counts()
    Out[28]: 
    <class 'float'>    4
    Name: col_1, dtype: int64
    
    89.
    In [25]: df2 = pd.read_csv(StringIO(data))
    
    In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")
    
    In [27]: df2
    Out[27]: 
       col_1
    0   1.00
    1   2.00
    2    NaN
    3   4.22
    
    In [28]: df2["col_1"].apply(type).value_counts()
    Out[28]: 
    <class 'float'>    4
    Name: col_1, dtype: int64
    
    90

  3. If you have a really non-standard format, use a custom

    In [25]: df2 = pd.read_csv(StringIO(data))
    
    In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")
    
    In [27]: df2
    Out[27]: 
       col_1
    0   1.00
    1   2.00
    2    NaN
    3   4.22
    
    In [28]: df2["col_1"].apply(type).value_counts()
    Out[28]: 
    <class 'float'>    4
    Name: col_1, dtype: int64
    
    70 function. For optimal performance, this should be vectorized, i. e. , it should accept arrays as arguments

Parsing a CSV with mixed timezones

pandas cannot natively represent a column or index with mixed timezones. If your CSV file contains columns with a mixture of timezones, the default result will be an object-dtype column with strings, even with

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
69

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
6

To parse the mixed-timezone values as a datetime column, pass a partially-applied with

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
94 as the
In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
70

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7

Inferring datetime format

If you have

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
69 enabled for some or all of your columns, and your datetime strings are all formatted the same way, you may get a large speed up by setting
In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
88. If set, pandas will attempt to guess the format of your datetime strings, and then use a faster means of parsing the strings. 5-10x parsing speeds have been observed. panda akan mundur ke penguraian biasa jika format tidak dapat ditebak atau format yang ditebak tidak dapat mengurai seluruh kolom string dengan benar. So in general,
In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
98 should not have any negative consequences if enabled

Here are some examples of datetime strings that can be guessed (All representing December 30th, 2011 at 00. 00. 00)

  • “20111230”

  • “2011/12/30”

  • “20111230 00. 00. 00”

  • “12/30/2011 00. 00. 00”

  • “30/Des/2011 00. 00. 00”

  • “30/Desember/2011 00. 00. 00”

Perhatikan bahwa

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
_98 peka terhadap
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
00. Dengan
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
_01, akan menebak "01/12/2011" menjadi 1 Desember. Dengan
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
_02 (default) akan menebak "01/12/2011" menjadi 12 Januari

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
_8

Format tanggal internasional

Sementara format tanggal AS cenderung MM/DD/YYYY, banyak format internasional menggunakan DD/MM/YYYY sebagai gantinya. Untuk kenyamanan, disediakan kata kunci

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
00

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
_9

Menulis CSV ke objek file biner

Baru di versi 1. 2. 0

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
_04 memungkinkan penulisan CSV ke file objek membuka mode biner. Dalam kebanyakan kasus, tidak perlu menentukan
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
05 karena Panda akan mendeteksi secara otomatis apakah objek file dibuka dalam mode teks atau biner

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
_0

Menentukan metode untuk konversi floating-point

Parameter

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
06 dapat ditentukan untuk menggunakan konverter floating-point tertentu selama penguraian dengan mesin C. Pilihannya adalah konverter biasa, konverter presisi tinggi, dan konverter bolak-balik (yang dijamin menjadi nilai bolak-balik setelah menulis ke file). Misalnya

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
_1

Seribu pemisah

Untuk angka besar yang telah ditulis dengan pemisah ribuan, Anda dapat mengatur kata kunci

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
07 menjadi string dengan panjang 1 sehingga bilangan bulat akan diuraikan dengan benar

Secara default, angka dengan pemisah ribuan akan diuraikan sebagai string

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
_2

Kata kunci

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
_07 memungkinkan bilangan bulat diuraikan dengan benar

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
_3

nilai NA

Untuk mengontrol nilai mana yang diuraikan sebagai nilai yang hilang (yang ditandai dengan

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46), tentukan string di
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
73. Jika Anda menentukan daftar string, maka semua nilai di dalamnya dianggap sebagai nilai yang hilang. Jika Anda menentukan nomor (a
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
11, seperti
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
12 atau
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
13 seperti
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
14), nilai setara yang sesuai juga akan menyiratkan nilai yang hilang (dalam hal ini secara efektif
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
15 diakui sebagai
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46)

Untuk mengganti sepenuhnya nilai default yang dianggap tidak ada, tentukan

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
17

Nilai default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
_46 yang dikenali adalah
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
19

Mari kita perhatikan beberapa contoh

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
_4

Dalam contoh di atas

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
14 dan
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
12 akan dikenali sebagai
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46, selain default. Sebuah string pertama-tama akan ditafsirkan sebagai
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
14 numerik, kemudian sebagai
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
_5

Di atas, hanya bidang kosong yang akan dikenali sebagai ________4______46

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
_6

Di atas, baik

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
26 dan
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
84 sebagai string adalah
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
_7

Nilai default, selain string

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
29 dikenali sebagai
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46

Ketakterbatasan

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
_31 seperti nilai akan diuraikan sebagai
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
32 (positif tak terhingga), dan
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
33 sebagai
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
34 (negatif tak terhingga). Ini akan mengabaikan kasus nilai, artinya
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
35, juga akan diuraikan sebagai
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
32

Seri Kembali

Menggunakan kata kunci

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
37, parser akan mengembalikan output dengan satu kolom sebagai
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62

Tidak digunakan lagi sejak versi 1. 4. 0. Pengguna sebaiknya menambahkan

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
63 ke DataFrame yang dikembalikan oleh
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 sebagai gantinya.

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
_8

Nilai Boolean

Nilai umum

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61,
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
43, dan
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
44 semuanya diakui sebagai boolean. Terkadang Anda mungkin ingin mengenali nilai lain sebagai boolean. Untuk melakukannya, gunakan opsi
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
_45 dan
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
46 sebagai berikut

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
_9

Menangani garis "buruk".

Beberapa file mungkin memiliki baris yang salah format dengan bidang yang terlalu sedikit atau terlalu banyak. Baris dengan bidang yang terlalu sedikit akan memiliki nilai NA yang terisi di bidang yang tertinggal. Baris dengan terlalu banyak bidang akan menimbulkan kesalahan secara default

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
_0

Anda dapat memilih untuk melewati garis buruk

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
_1

Atau lewati fungsi yang dapat dipanggil untuk menangani garis buruk jika

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
47. Garis buruk akan menjadi daftar string yang dipisahkan oleh
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
48

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
_2

Anda juga dapat menggunakan parameter

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
_47 untuk menghilangkan data kolom asing yang muncul di beberapa baris tetapi tidak di baris lainnya

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
_3

Jika Anda ingin menyimpan semua data termasuk baris dengan terlalu banyak bidang, Anda dapat menentukan jumlah

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
49 yang cukup. Ini memastikan bahwa baris dengan bidang yang tidak cukup diisi dengan
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
_4

Dialek

Kata kunci

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
_52 memberikan fleksibilitas yang lebih besar dalam menentukan format file. Secara default menggunakan dialek Excel tetapi Anda dapat menentukan nama dialek atau contoh

Misalkan Anda memiliki data dengan tanda kutip yang tidak tertutup

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
_5

Secara default,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
_66 menggunakan dialek Excel dan memperlakukan tanda kutip ganda sebagai karakter tanda kutip, yang menyebabkannya gagal saat menemukan baris baru sebelum menemukan tanda kutip ganda penutup

Kita bisa mengatasi ini menggunakan

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
52

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
_6

Semua opsi dialek dapat ditentukan secara terpisah dengan argumen kata kunci

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
_7

Opsi dialek umum lainnya adalah

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
95, untuk melewati spasi setelah pembatas

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
_8

Pengurai melakukan segala upaya untuk "melakukan hal yang benar" dan tidak rapuh. Jenis inferensi adalah masalah yang cukup besar. Jika sebuah kolom dapat dipaksa menjadi tipe integer tanpa mengubah isinya, parser akan melakukannya. Setiap kolom non-numerik akan muncul sebagai objek dtype seperti objek panda lainnya

Mengutip dan Melarikan Diri Karakter

Kutipan (dan karakter melarikan diri lainnya) di bidang yang disematkan dapat ditangani dengan berbagai cara. Salah satu caranya adalah dengan menggunakan garis miring terbalik;

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
_9

File dengan kolom lebar tetap

Saat membaca data yang dibatasi, fungsi bekerja dengan file data yang memiliki lebar kolom yang diketahui dan tetap. Parameter fungsi untuk

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
_60 sebagian besar sama dengan
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 dengan dua parameter tambahan, dan penggunaan yang berbeda dari parameter
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
33

  • In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))
    
    In [30]: df = pd.DataFrame({"col_1": col_1})
    
    In [31]: df.to_csv("foo.csv")
    
    In [32]: mixed_df = pd.read_csv("foo.csv")
    
    In [33]: mixed_df["col_1"].apply(type).value_counts()
    Out[33]: 
    <class 'int'>    737858
    <class 'str'>    262144
    Name: col_1, dtype: int64
    
    In [34]: mixed_df["col_1"].dtype
    Out[34]: dtype('O')
    
    _63. Daftar pasangan (tupel) yang memberikan luasan bidang dengan lebar tetap dari setiap baris sebagai interval setengah terbuka (i. e. , [dari untuk[ ). Nilai string 'infer' dapat digunakan untuk menginstruksikan parser untuk mencoba mendeteksi spesifikasi kolom dari 100 baris pertama data. Perilaku default, jika tidak ditentukan, adalah menyimpulkan

  • In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))
    
    In [30]: df = pd.DataFrame({"col_1": col_1})
    
    In [31]: df.to_csv("foo.csv")
    
    In [32]: mixed_df = pd.read_csv("foo.csv")
    
    In [33]: mixed_df["col_1"].apply(type).value_counts()
    Out[33]: 
    <class 'int'>    737858
    <class 'str'>    262144
    Name: col_1, dtype: int64
    
    In [34]: mixed_df["col_1"].dtype
    Out[34]: dtype('O')
    
    _64. Daftar lebar bidang yang dapat digunakan sebagai pengganti 'colspec' jika intervalnya bersebelahan

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    _33. Karakter untuk dipertimbangkan sebagai karakter pengisi dalam file dengan lebar tetap. Dapat digunakan untuk menentukan karakter pengisi bidang jika bukan spasi (mis. g. , '~')

Pertimbangkan file data dengan lebar tetap yang khas

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
0

Untuk mengurai file ini menjadi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
_43, kita hanya perlu menyediakan spesifikasi kolom ke fungsi
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
60 bersama dengan nama file

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
_1

Note how the parser automatically picks column names X. when

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
36 argument is specified. Alternatively, you can supply just the column widths for contiguous columns:

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
2

Parser akan menangani ruang putih ekstra di sekitar kolom, jadi tidak apa-apa untuk memiliki pemisahan ekstra antara kolom dalam file

Secara default,

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
_60 akan mencoba menyimpulkan
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
63 file dengan menggunakan 100 baris pertama file. Itu dapat melakukannya hanya dalam kasus ketika kolom disejajarkan dan dipisahkan dengan benar oleh
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
33 yang disediakan (pembatas default adalah spasi putih)

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
_3

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
_60 mendukung parameter
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88 untuk menentukan jenis kolom yang diuraikan agar berbeda dari jenis yang disimpulkan

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
_4

Indeks

File dengan kolom indeks "implisit".

Pertimbangkan file dengan satu entri lebih sedikit di header daripada jumlah kolom data

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
_5

Dalam kasus khusus ini,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 mengasumsikan bahwa kolom pertama akan digunakan sebagai indeks dari
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
_6

Perhatikan bahwa tanggal tidak diuraikan secara otomatis. Dalam hal ini Anda perlu melakukan seperti sebelumnya

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
_7

Membaca indeks dengan
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
76

Misalkan Anda memiliki data yang diindeks oleh dua kolom

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
_8

Argumen

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
_64 ke
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 dapat mengambil daftar nomor kolom untuk mengubah banyak kolom menjadi
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
76 untuk indeks objek yang dikembalikan

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
_9

Membaca kolom dengan
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
_76

Dengan menentukan daftar lokasi baris untuk argumen

In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
84, Anda dapat membaca dalam
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
76 untuk kolom. Menentukan baris yang tidak berurutan akan melewati baris yang mengintervensi

In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
_0

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
_66 juga dapat menginterpretasikan format indeks multi-kolom yang lebih umum

In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
_1

Catatan

Jika

In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
_64 tidak ditentukan (mis. g. Anda tidak memiliki indeks, atau menulisnya dengan
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
85, maka
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
49 apa pun pada indeks kolom akan hilang

Secara otomatis "mengendus" pembatas

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
_66 mampu menyimpulkan file yang dibatasi (tidak harus dipisahkan koma), karena panda menggunakan kelas modul csv. Untuk ini, Anda harus menentukan ________208______89

In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
_2

Membaca banyak file untuk membuat satu DataFrame

Paling baik digunakan untuk menggabungkan banyak file. Lihat sebagai contoh

Iterasi melalui file potongan demi potongan

Misalkan Anda ingin mengulang melalui file (berpotensi sangat besar) dengan malas daripada membaca seluruh file ke dalam memori, seperti berikut ini

In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
_3

Dengan menentukan

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
90 hingga
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66, nilai yang dikembalikan akan berupa objek bertipe iterable
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
32

In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
_4

Berubah pada versi 1. 2.

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
94 mengembalikan pengelola konteks saat melakukan iterasi melalui file.

Menentukan

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
_95 juga akan mengembalikan objek
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
32

In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
_5

Menentukan mesin parser

Panda saat ini mendukung tiga mesin, mesin C, mesin python, dan mesin pyarrow eksperimental (memerlukan paket

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
97). Secara umum, mesin pyarrow tercepat pada beban kerja yang lebih besar dan kecepatannya setara dengan mesin C pada sebagian besar beban kerja lainnya. Mesin python cenderung lebih lambat daripada mesin pyarrow dan C pada sebagian besar beban kerja. Namun, mesin pyarrow jauh lebih tangguh daripada mesin C, yang kekurangan beberapa fitur dibandingkan dengan mesin Python

Jika memungkinkan, panda menggunakan parser C (ditentukan sebagai

In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
98), tetapi mungkin kembali ke Python jika opsi yang tidak didukung C ditentukan

Saat ini, opsi yang tidak didukung oleh mesin C dan pyrarrow termasuk

  • In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))
    
    In [30]: df = pd.DataFrame({"col_1": col_1})
    
    In [31]: df.to_csv("foo.csv")
    
    In [32]: mixed_df = pd.read_csv("foo.csv")
    
    In [33]: mixed_df["col_1"].apply(type).value_counts()
    Out[33]: 
    <class 'int'>    737858
    <class 'str'>    262144
    Name: col_1, dtype: int64
    
    In [34]: mixed_df["col_1"].dtype
    Out[34]: dtype('O')
    
    _48 selain karakter tunggal (mis. g. pemisah regex)

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    _00

  • ________208______89 dengan

    In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    02

Menentukan salah satu opsi di atas akan menghasilkan

In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
03 kecuali mesin python dipilih secara eksplisit menggunakan
In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
04

Opsi yang tidak didukung oleh mesin pyarrow yang tidak tercakup dalam daftar di atas termasuk

  • In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))
    
    In [30]: df = pd.DataFrame({"col_1": col_1})
    
    In [31]: df.to_csv("foo.csv")
    
    In [32]: mixed_df = pd.read_csv("foo.csv")
    
    In [33]: mixed_df["col_1"].apply(type).value_counts()
    Out[33]: 
    <class 'int'>    737858
    <class 'str'>    262144
    Name: col_1, dtype: int64
    
    In [34]: mixed_df["col_1"].dtype
    Out[34]: dtype('O')
    
    _06

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    _90

  • In [25]: df2 = pd.read_csv(StringIO(data))
    
    In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")
    
    In [27]: df2
    Out[27]: 
       col_1
    0   1.00
    1   2.00
    2    NaN
    3   4.22
    
    In [28]: df2["col_1"].apply(type).value_counts()
    Out[28]: 
    <class 'float'>    4
    Name: col_1, dtype: int64
    
    _52

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    _08

  • In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))
    
    In [30]: df = pd.DataFrame({"col_1": col_1})
    
    In [31]: df.to_csv("foo.csv")
    
    In [32]: mixed_df = pd.read_csv("foo.csv")
    
    In [33]: mixed_df["col_1"].apply(type).value_counts()
    Out[33]: 
    <class 'int'>    737858
    <class 'str'>    262144
    Name: col_1, dtype: int64
    
    In [34]: mixed_df["col_1"].dtype
    Out[34]: dtype('O')
    
    _07

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    _10

  • In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))
    
    In [30]: df = pd.DataFrame({"col_1": col_1})
    
    In [31]: df.to_csv("foo.csv")
    
    In [32]: mixed_df = pd.read_csv("foo.csv")
    
    In [33]: mixed_df["col_1"].apply(type).value_counts()
    Out[33]: 
    <class 'int'>    737858
    <class 'str'>    262144
    Name: col_1, dtype: int64
    
    In [34]: mixed_df["col_1"].dtype
    Out[34]: dtype('O')
    
    _52

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    _12

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    _13

  • In [25]: df2 = pd.read_csv(StringIO(data))
    
    In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")
    
    In [27]: df2
    Out[27]: 
       col_1
    0   1.00
    1   2.00
    2    NaN
    3   4.22
    
    In [28]: df2["col_1"].apply(type).value_counts()
    Out[28]: 
    <class 'float'>    4
    Name: col_1, dtype: int64
    
    _03

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    _15

  • In [21]: data = "col_1\n1\n2\n'A'\n4.22"
    
    In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})
    
    In [23]: df
    Out[23]: 
      col_1
    0     1
    1     2
    2   'A'
    3  4.22
    
    In [24]: df["col_1"].apply(type).value_counts()
    Out[24]: 
    <class 'str'>    4
    Name: col_1, dtype: int64
    
    _76

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    _17

  • In [25]: df2 = pd.read_csv(StringIO(data))
    
    In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")
    
    In [27]: df2
    Out[27]: 
       col_1
    0   1.00
    1   2.00
    2    NaN
    3   4.22
    
    In [28]: df2["col_1"].apply(type).value_counts()
    Out[28]: 
    <class 'float'>    4
    Name: col_1, dtype: int64
    
    _11

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    _19

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    _91

  • In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))
    
    In [30]: df = pd.DataFrame({"col_1": col_1})
    
    In [31]: df.to_csv("foo.csv")
    
    In [32]: mixed_df = pd.read_csv("foo.csv")
    
    In [33]: mixed_df["col_1"].apply(type).value_counts()
    Out[33]: 
    <class 'int'>    737858
    <class 'str'>    262144
    Name: col_1, dtype: int64
    
    In [34]: mixed_df["col_1"].dtype
    Out[34]: dtype('O')
    
    _00

  • In [25]: df2 = pd.read_csv(StringIO(data))
    
    In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")
    
    In [27]: df2
    Out[27]: 
       col_1
    0   1.00
    1   2.00
    2    NaN
    3   4.22
    
    In [28]: df2["col_1"].apply(type).value_counts()
    Out[28]: 
    <class 'float'>    4
    Name: col_1, dtype: int64
    
    _98

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    _23

  • In [21]: data = "col_1\n1\n2\n'A'\n4.22"
    
    In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})
    
    In [23]: df
    Out[23]: 
      col_1
    0     1
    1     2
    2   'A'
    3  4.22
    
    In [24]: df["col_1"].apply(type).value_counts()
    Out[24]: 
    <class 'str'>    4
    Name: col_1, dtype: int64
    
    _95

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    _25

Menentukan opsi ini dengan

In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
_26 akan memunculkan
In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
27

Membaca/menulis file jarak jauh

Anda dapat meneruskan URL untuk membaca atau menulis file jarak jauh ke banyak fungsi IO panda - contoh berikut menunjukkan membaca file CSV

In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
_6

New in version 1. 3. 0

Header khusus dapat dikirim bersama permintaan HTTP dengan meneruskan kamus pemetaan nilai kunci header ke argumen kata kunci

In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
28 seperti yang ditunjukkan di bawah ini

In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
_7

Semua URL yang bukan file lokal atau HTTP(s) ditangani oleh fsspec, jika dipasang, dan berbagai implementasi sistem filenya (termasuk Amazon S3, Google Cloud, SSH, FTP, webHDFS…). Beberapa dari implementasi ini akan memerlukan paket tambahan untuk diinstal, misalnya URL S3 memerlukan pustaka s3fs

In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
_8

Saat berurusan dengan sistem penyimpanan jarak jauh, Anda mungkin memerlukan konfigurasi tambahan dengan variabel lingkungan atau file konfigurasi di lokasi khusus. Misalnya, untuk mengakses data di bucket S3, Anda perlu menentukan kredensial dengan salah satu dari beberapa cara yang tercantum di. Hal yang sama berlaku untuk beberapa backend penyimpanan, dan Anda harus mengikuti tautan di untuk implementasi yang dibangun ke dalam

In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
29 dan untuk yang tidak disertakan dalam distribusi utama
In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
29

Anda juga dapat meneruskan parameter langsung ke driver backend. Misalnya, jika Anda tidak memiliki kredensial S3, Anda tetap dapat mengakses data publik dengan menentukan koneksi anonim, seperti

Baru di versi 1. 2. 0

In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
_9

In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
_29 juga memungkinkan URL kompleks, untuk mengakses data dalam arsip terkompresi, caching file lokal, dan banyak lagi. Untuk menyimpan contoh di atas secara lokal, Anda akan memodifikasi panggilan ke

In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
_0

di mana kami menentukan bahwa parameter "anon" dimaksudkan untuk bagian "s3" dari implementasi, bukan untuk implementasi caching. Perhatikan bahwa ini menyimpan cache ke direktori sementara selama durasi sesi saja, tetapi Anda juga dapat menentukan penyimpanan permanen

Menulis ke format CSV

Objek

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62 dan
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 memiliki metode instance
In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
34 yang memungkinkan penyimpanan konten objek sebagai file nilai yang dipisahkan koma. Fungsi mengambil sejumlah argumen. Hanya yang pertama yang diperlukan

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    _35. Jalur string ke file untuk menulis atau objek file. Jika objek file itu harus dibuka dengan
    In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    36

  • In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))
    
    In [30]: df = pd.DataFrame({"col_1": col_1})
    
    In [31]: df.to_csv("foo.csv")
    
    In [32]: mixed_df = pd.read_csv("foo.csv")
    
    In [33]: mixed_df["col_1"].apply(type).value_counts()
    Out[33]: 
    <class 'int'>    737858
    <class 'str'>    262144
    Name: col_1, dtype: int64
    
    In [34]: mixed_df["col_1"].dtype
    Out[34]: dtype('O')
    
    _48. Pemisah bidang untuk file keluaran (default ",")

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    _38. Representasi string dari nilai yang hilang (default ‘’)

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    _39. Format string untuk angka floating point

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    _40. Kolom untuk ditulis (default Tidak Ada)

  • In [21]: data = "col_1\n1\n2\n'A'\n4.22"
    
    In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})
    
    In [23]: df
    Out[23]: 
      col_1
    0     1
    1     2
    2   'A'
    3  4.22
    
    In [24]: df["col_1"].apply(type).value_counts()
    Out[24]: 
    <class 'str'>    4
    Name: col_1, dtype: int64
    
    _84. Apakah akan menuliskan nama kolom (default True)

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    _42. apakah akan menulis nama baris (indeks) (default True)

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    _43. Label kolom untuk kolom indeks jika diinginkan. Jika Tidak Ada (default), dan
    In [21]: data = "col_1\n1\n2\n'A'\n4.22"
    
    In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})
    
    In [23]: df
    Out[23]: 
      col_1
    0     1
    1     2
    2   'A'
    3  4.22
    
    In [24]: df["col_1"].apply(type).value_counts()
    Out[24]: 
    <class 'str'>    4
    Name: col_1, dtype: int64
    
    _84 dan
    In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    42 Benar, maka nama indeks digunakan. (Urutan harus diberikan jika
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    _43 menggunakan MultiIndex)

  • In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))
    
    In [30]: df = pd.DataFrame({"col_1": col_1})
    
    In [31]: df.to_csv("foo.csv")
    
    In [32]: mixed_df = pd.read_csv("foo.csv")
    
    In [33]: mixed_df["col_1"].apply(type).value_counts()
    Out[33]: 
    <class 'int'>    737858
    <class 'str'>    262144
    Name: col_1, dtype: int64
    
    In [34]: mixed_df["col_1"].dtype
    Out[34]: dtype('O')
    
    _05. Mode tulis Python, default 'w'

  • In [25]: df2 = pd.read_csv(StringIO(data))
    
    In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")
    
    In [27]: df2
    Out[27]: 
       col_1
    0   1.00
    1   2.00
    2    NaN
    3   4.22
    
    In [28]: df2["col_1"].apply(type).value_counts()
    Out[28]: 
    <class 'float'>    4
    Name: col_1, dtype: int64
    
    _62. string yang mewakili penyandian untuk digunakan jika kontennya non-ASCII, untuk versi Python sebelum 3

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    _17. Character sequence denoting line end (default
    In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    50)

  • In [21]: data = "col_1\n1\n2\n'A'\n4.22"
    
    In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})
    
    In [23]: df
    Out[23]: 
      col_1
    0     1
    1     2
    2   'A'
    3  4.22
    
    In [24]: df["col_1"].apply(type).value_counts()
    Out[24]: 
    <class 'str'>    4
    Name: col_1, dtype: int64
    
    76. Set quoting rules as in csv module (default csv. QUOTE_MINIMAL). Note that if you have set a
    In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    39 then floats are converted to strings and csv. QUOTE_NONNUMERIC will treat them as non-numeric

  • In [21]: data = "col_1\n1\n2\n'A'\n4.22"
    
    In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})
    
    In [23]: df
    Out[23]: 
      col_1
    0     1
    1     2
    2   'A'
    3  4.22
    
    In [24]: df["col_1"].apply(type).value_counts()
    Out[24]: 
    <class 'str'>    4
    Name: col_1, dtype: int64
    
    75. Character used to quote fields (default ‘”’)

  • In [21]: data = "col_1\n1\n2\n'A'\n4.22"
    
    In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})
    
    In [23]: df
    Out[23]: 
      col_1
    0     1
    1     2
    2   'A'
    3  4.22
    
    In [24]: df["col_1"].apply(type).value_counts()
    Out[24]: 
    <class 'str'>    4
    Name: col_1, dtype: int64
    
    93. Control quoting of
    In [21]: data = "col_1\n1\n2\n'A'\n4.22"
    
    In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})
    
    In [23]: df
    Out[23]: 
      col_1
    0     1
    1     2
    2   'A'
    3  4.22
    
    In [24]: df["col_1"].apply(type).value_counts()
    Out[24]: 
    <class 'str'>    4
    Name: col_1, dtype: int64
    
    75 in fields (default True)

  • In [21]: data = "col_1\n1\n2\n'A'\n4.22"
    
    In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})
    
    In [23]: df
    Out[23]: 
      col_1
    0     1
    1     2
    2   'A'
    3  4.22
    
    In [24]: df["col_1"].apply(type).value_counts()
    Out[24]: 
    <class 'str'>    4
    Name: col_1, dtype: int64
    
    94. Character used to escape
    In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))
    
    In [30]: df = pd.DataFrame({"col_1": col_1})
    
    In [31]: df.to_csv("foo.csv")
    
    In [32]: mixed_df = pd.read_csv("foo.csv")
    
    In [33]: mixed_df["col_1"].apply(type).value_counts()
    Out[33]: 
    <class 'int'>    737858
    <class 'str'>    262144
    Name: col_1, dtype: int64
    
    In [34]: mixed_df["col_1"].dtype
    Out[34]: dtype('O')
    
    48 and
    In [21]: data = "col_1\n1\n2\n'A'\n4.22"
    
    In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})
    
    In [23]: df
    Out[23]: 
      col_1
    0     1
    1     2
    2   'A'
    3  4.22
    
    In [24]: df["col_1"].apply(type).value_counts()
    Out[24]: 
    <class 'str'>    4
    Name: col_1, dtype: int64
    
    75 when appropriate (default None)

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    90. Number of rows to write at a time

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    60. Format string for datetime objects

Writing a formatted string

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 object has an instance method
In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
62 which allows control over the string representation of the object. All arguments are optional

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    63 default None, for example a StringIO object

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    40 default None, which columns to write

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    65 default None, minimum width of each column

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    38 default
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    46, representation of NA value

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    68 default None, a dictionary (by column) of functions each of which takes a single argument and returns a formatted string

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    39 default None, a function which takes a single (float) argument and returns a formatted string; to be applied to floats in the
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    43

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    71 default True, set to False for a
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    43 with a hierarchical index to print every MultiIndex key at each row

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    73 default True, will print the names of the indices

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    42 default True, will print the index (ie, row labels)

  • In [21]: data = "col_1\n1\n2\n'A'\n4.22"
    
    In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})
    
    In [23]: df
    Out[23]: 
      col_1
    0     1
    1     2
    2   'A'
    3  4.22
    
    In [24]: df["col_1"].apply(type).value_counts()
    Out[24]: 
    <class 'str'>    4
    Name: col_1, dtype: int64
    
    84 default True, will print the column labels

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    76 default
    In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    77, will print column headers left- or right-justified

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62 object also has a
In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
62 method, but with only the
In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
63,
In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
38,
In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
39 arguments. Ada juga
In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
_83 argumen yang, jika diatur ke
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32, juga akan menampilkan panjang Seri

JSON

Read and write

In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
85 format files and strings

A

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 can be converted to a valid JSON string. Use
In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
88 with optional parameters

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    35 . pathname atau buffer untuk menulis output Ini bisa
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    24 dalam hal ini string JSON dikembalikan

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    91

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    62
    • default is

      In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
      
      In [36]: pd.read_csv(StringIO(data))
      Out[36]: 
        col1 col2  col3
      0    a    b     1
      1    a    b     2
      2    c    d     3
      
      In [37]: pd.read_csv(StringIO(data)).dtypes
      Out[37]: 
      col1    object
      col2    object
      col3     int64
      dtype: object
      
      In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
      Out[38]: 
      col1    category
      col2    category
      col3    category
      dtype: object
      
      42

    • allowed values are {

      In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
      
      In [36]: pd.read_csv(StringIO(data))
      Out[36]: 
        col1 col2  col3
      0    a    b     1
      1    a    b     2
      2    c    d     3
      
      In [37]: pd.read_csv(StringIO(data)).dtypes
      Out[37]: 
      col1    object
      col2    object
      col3     int64
      dtype: object
      
      In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
      Out[38]: 
      col1    category
      col2    category
      col3    category
      dtype: object
      
      94,
      In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
      
      In [36]: pd.read_csv(StringIO(data))
      Out[36]: 
        col1 col2  col3
      0    a    b     1
      1    a    b     2
      2    c    d     3
      
      In [37]: pd.read_csv(StringIO(data)).dtypes
      Out[37]: 
      col1    object
      col2    object
      col3     int64
      dtype: object
      
      In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
      Out[38]: 
      col1    category
      col2    category
      col3    category
      dtype: object
      
      95,
      In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
      
      In [36]: pd.read_csv(StringIO(data))
      Out[36]: 
        col1 col2  col3
      0    a    b     1
      1    a    b     2
      2    c    d     3
      
      In [37]: pd.read_csv(StringIO(data)).dtypes
      Out[37]: 
      col1    object
      col2    object
      col3     int64
      dtype: object
      
      In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
      Out[38]: 
      col1    category
      col2    category
      col3    category
      dtype: object
      
      42}

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    43
    • default is

      In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
      
      In [36]: pd.read_csv(StringIO(data))
      Out[36]: 
        col1 col2  col3
      0    a    b     1
      1    a    b     2
      2    c    d     3
      
      In [37]: pd.read_csv(StringIO(data)).dtypes
      Out[37]: 
      col1    object
      col2    object
      col3     int64
      dtype: object
      
      In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
      Out[38]: 
      col1    category
      col2    category
      col3    category
      dtype: object
      
      40

    • allowed values are {

      In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
      
      In [36]: pd.read_csv(StringIO(data))
      Out[36]: 
        col1 col2  col3
      0    a    b     1
      1    a    b     2
      2    c    d     3
      
      In [37]: pd.read_csv(StringIO(data)).dtypes
      Out[37]: 
      col1    object
      col2    object
      col3     int64
      dtype: object
      
      In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
      Out[38]: 
      col1    category
      col2    category
      col3    category
      dtype: object
      
      94,
      In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
      
      In [36]: pd.read_csv(StringIO(data))
      Out[36]: 
        col1 col2  col3
      0    a    b     1
      1    a    b     2
      2    c    d     3
      
      In [37]: pd.read_csv(StringIO(data)).dtypes
      Out[37]: 
      col1    object
      col2    object
      col3     int64
      dtype: object
      
      In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
      Out[38]: 
      col1    category
      col2    category
      col3    category
      dtype: object
      
      95,
      In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
      
      In [36]: pd.read_csv(StringIO(data))
      Out[36]: 
        col1 col2  col3
      0    a    b     1
      1    a    b     2
      2    c    d     3
      
      In [37]: pd.read_csv(StringIO(data)).dtypes
      Out[37]: 
      col1    object
      col2    object
      col3     int64
      dtype: object
      
      In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
      Out[38]: 
      col1    category
      col2    category
      col3    category
      dtype: object
      
      42,
      In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
      
      In [36]: pd.read_csv(StringIO(data))
      Out[36]: 
        col1 col2  col3
      0    a    b     1
      1    a    b     2
      2    c    d     3
      
      In [37]: pd.read_csv(StringIO(data)).dtypes
      Out[37]: 
      col1    object
      col2    object
      col3     int64
      dtype: object
      
      In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
      Out[38]: 
      col1    category
      col2    category
      col3    category
      dtype: object
      
      40,
      In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
      Out[39]: 
      col1    category
      col2      object
      col3       int64
      dtype: object
      
      03,
      In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
      Out[39]: 
      col1    category
      col2      object
      col3       int64
      dtype: object
      
      04}

    The format of the JSON string

    In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    94

    dict like {index -> [index], columns -> [columns], data -> [values]}

    In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    95

    list like [{column -> value}, … , {column -> value}]

    In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    42

    dict like {index -> {column -> value}}

    In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    40

    dict like {column -> {index -> value}}

    In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
    Out[39]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    03

    just the values array

    In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
    Out[39]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    04

    adhering to the JSON Table Schema

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    60 . string, type of date conversion, ‘epoch’ for timestamp, ‘iso’ for ISO8601

  • In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
    Out[39]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    12 . The number of decimal places to use when encoding floating point values, default 10

  • In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
    Out[39]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    13 . force encoded string to be ASCII, default True

  • In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
    Out[39]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    14 . The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’ or ‘ns’ for seconds, milliseconds, microseconds and nanoseconds respectively. Default ‘ms’

  • In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
    Out[39]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    15 . The handler to call if an object cannot otherwise be converted to a suitable format for JSON. Takes a single argument, which is the object to convert, and returns a serializable object

  • In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
    Out[39]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    16 . If
    In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    95 orient, then will write each record per line as json

Note

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46’s,
In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
19’s and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24 will be converted to
In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
21 and
In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]: 
   col_1
0   1.00
1   2.00
2    NaN
3   4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]: 
<class 'float'>    4
Name: col_1, dtype: int64
71 objects will be converted based on the
In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
60 and
In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
14 parameters

In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
1

Orient options

There are a number of different options for the format of the resulting JSON file / string. Consider the following

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62

In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
2

Column oriented (the default for

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43) serializes the data as nested JSON objects with column labels acting as the primary index

In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
3

Index oriented (the default for

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62) similar to column oriented but the index labels are now primary

In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
4

Record oriented serializes the data to a JSON array of column -> value records, index labels are not included. This is useful for passing

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 data to plotting libraries, for example the JavaScript library
In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
30

In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
5

Value oriented is a bare-bones option which serializes to nested JSON arrays of values only, column and index labels are not included

In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
_6

Split oriented serializes to a JSON object containing separate entries for values, index and columns. Name is also included for

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62

In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
7

Table oriented serializes to the JSON Table Schema, allowing for the preservation of metadata including but not limited to dtypes and index names

Catatan

Any orient option that encodes to a JSON object will not preserve the ordering of index and column labels during round-trip serialization. If you wish to preserve label ordering use the

In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
94 option as it uses ordered containers

Penanganan tanggal

Menulis dalam format tanggal ISO

In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
_8

Menulis dalam format tanggal ISO, dengan mikrodetik

In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
_9

Stempel waktu Epoch, dalam detik

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
_0

Menulis ke file, dengan indeks tanggal dan kolom tanggal

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
_1

Perilaku mundur

Jika serializer JSON tidak dapat menangani konten penampung secara langsung, ia akan mundur dengan cara berikut

  • jika dtype tidak didukung (mis. g.

    In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
    Out[39]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    33) then the
    In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
    Out[39]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    15, if provided, will be called for each value, otherwise an exception is raised

  • if an object is unsupported it will attempt the following

    • check if the object has defined a

      In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
      Out[39]: 
      col1    category
      col2      object
      col3       int64
      dtype: object
      
      35 method and call it. A
      In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
      Out[39]: 
      col1    category
      col2      object
      col3       int64
      dtype: object
      
      35 method should return a
      In [21]: data = "col_1\n1\n2\n'A'\n4.22"
      
      In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})
      
      In [23]: df
      Out[23]: 
        col_1
      0     1
      1     2
      2   'A'
      3  4.22
      
      In [24]: df["col_1"].apply(type).value_counts()
      Out[24]: 
      <class 'str'>    4
      Name: col_1, dtype: int64
      
      43 which will then be JSON serialized

    • invoke the

      In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
      Out[39]: 
      col1    category
      col2      object
      col3       int64
      dtype: object
      
      15 if one was provided

    • convert the object to a

      In [21]: data = "col_1\n1\n2\n'A'\n4.22"
      
      In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})
      
      In [23]: df
      Out[23]: 
        col_1
      0     1
      1     2
      2   'A'
      3  4.22
      
      In [24]: df["col_1"].apply(type).value_counts()
      Out[24]: 
      <class 'str'>    4
      Name: col_1, dtype: int64
      
      43 by traversing its contents. However this will often fail with an
      In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
      Out[39]: 
      col1    category
      col2      object
      col3       int64
      dtype: object
      
      40 or give unexpected results

In general the best approach for unsupported objects or dtypes is to provide a

In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
15. For example

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
2

can be dealt with by specifying a simple

In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
15

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
3

Reading JSON

Reading a JSON string to pandas object can take a number of parameters. The parser will try to parse a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 if
In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
44 is not supplied or is
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24. To explicitly force
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62 parsing, pass
In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
47

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    92 . a VALID JSON string or file handle / StringIO. The string could be a URL. Valid URL schemes include http, ftp, S3, and file. For file URLs, a host is expected. For instance, a local file could be file . //localhost/path/to/table. json

  • In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
    Out[39]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    44 . type of object to recover (series or frame), default ‘frame’

  • In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    91

    Series
    • default is

      In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
      
      In [36]: pd.read_csv(StringIO(data))
      Out[36]: 
        col1 col2  col3
      0    a    b     1
      1    a    b     2
      2    c    d     3
      
      In [37]: pd.read_csv(StringIO(data)).dtypes
      Out[37]: 
      col1    object
      col2    object
      col3     int64
      dtype: object
      
      In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
      Out[38]: 
      col1    category
      col2    category
      col3    category
      dtype: object
      
      42

    • allowed values are {

      In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
      
      In [36]: pd.read_csv(StringIO(data))
      Out[36]: 
        col1 col2  col3
      0    a    b     1
      1    a    b     2
      2    c    d     3
      
      In [37]: pd.read_csv(StringIO(data)).dtypes
      Out[37]: 
      col1    object
      col2    object
      col3     int64
      dtype: object
      
      In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
      Out[38]: 
      col1    category
      col2    category
      col3    category
      dtype: object
      
      94,
      In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
      
      In [36]: pd.read_csv(StringIO(data))
      Out[36]: 
        col1 col2  col3
      0    a    b     1
      1    a    b     2
      2    c    d     3
      
      In [37]: pd.read_csv(StringIO(data)).dtypes
      Out[37]: 
      col1    object
      col2    object
      col3     int64
      dtype: object
      
      In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
      Out[38]: 
      col1    category
      col2    category
      col3    category
      dtype: object
      
      95,
      In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
      
      In [36]: pd.read_csv(StringIO(data))
      Out[36]: 
        col1 col2  col3
      0    a    b     1
      1    a    b     2
      2    c    d     3
      
      In [37]: pd.read_csv(StringIO(data)).dtypes
      Out[37]: 
      col1    object
      col2    object
      col3     int64
      dtype: object
      
      In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
      Out[38]: 
      col1    category
      col2    category
      col3    category
      dtype: object
      
      42}

    DataFrame
    • default is

      In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
      
      In [36]: pd.read_csv(StringIO(data))
      Out[36]: 
        col1 col2  col3
      0    a    b     1
      1    a    b     2
      2    c    d     3
      
      In [37]: pd.read_csv(StringIO(data)).dtypes
      Out[37]: 
      col1    object
      col2    object
      col3     int64
      dtype: object
      
      In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
      Out[38]: 
      col1    category
      col2    category
      col3    category
      dtype: object
      
      40

    • allowed values are {

      In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
      
      In [36]: pd.read_csv(StringIO(data))
      Out[36]: 
        col1 col2  col3
      0    a    b     1
      1    a    b     2
      2    c    d     3
      
      In [37]: pd.read_csv(StringIO(data)).dtypes
      Out[37]: 
      col1    object
      col2    object
      col3     int64
      dtype: object
      
      In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
      Out[38]: 
      col1    category
      col2    category
      col3    category
      dtype: object
      
      94,
      In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
      
      In [36]: pd.read_csv(StringIO(data))
      Out[36]: 
        col1 col2  col3
      0    a    b     1
      1    a    b     2
      2    c    d     3
      
      In [37]: pd.read_csv(StringIO(data)).dtypes
      Out[37]: 
      col1    object
      col2    object
      col3     int64
      dtype: object
      
      In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
      Out[38]: 
      col1    category
      col2    category
      col3    category
      dtype: object
      
      95,
      In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
      
      In [36]: pd.read_csv(StringIO(data))
      Out[36]: 
        col1 col2  col3
      0    a    b     1
      1    a    b     2
      2    c    d     3
      
      In [37]: pd.read_csv(StringIO(data)).dtypes
      Out[37]: 
      col1    object
      col2    object
      col3     int64
      dtype: object
      
      In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
      Out[38]: 
      col1    category
      col2    category
      col3    category
      dtype: object
      
      42,
      In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
      
      In [36]: pd.read_csv(StringIO(data))
      Out[36]: 
        col1 col2  col3
      0    a    b     1
      1    a    b     2
      2    c    d     3
      
      In [37]: pd.read_csv(StringIO(data)).dtypes
      Out[37]: 
      col1    object
      col2    object
      col3     int64
      dtype: object
      
      In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
      Out[38]: 
      col1    category
      col2    category
      col3    category
      dtype: object
      
      40,
      In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
      Out[39]: 
      col1    category
      col2      object
      col3       int64
      dtype: object
      
      03,
      In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
      Out[39]: 
      col1    category
      col2      object
      col3       int64
      dtype: object
      
      04}

    The format of the JSON string

    In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    94

    dict like {index -> [index], columns -> [columns], data -> [values]}

    In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    95

    list like [{column -> value}, … , {column -> value}]

    In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    42

    dict like {index -> {column -> value}}

    In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    40

    dict like {column -> {index -> value}}

    In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
    Out[39]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    03

    just the values array

    In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
    Out[39]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    04

    adhering to the JSON Table Schema

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    88 . if True, infer dtypes, if a dict of column to dtype, then use those, if
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    61, then don’t infer dtypes at all, default is True, apply only to the data

  • In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
    Out[39]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    70 . boolean, try to convert the axes to the proper dtypes, default is
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    32

  • In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
    Out[39]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    72 . a list of columns to parse for dates; If
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    32, then try to parse date-like columns, default is
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    32

  • In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
    Out[39]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    75 . boolean, default
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    32. If parsing dates, then parse the default date-like columns

  • In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
    Out[39]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    77 . direct decoding to NumPy arrays. default is
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    61; Supports numeric data only, although labels may be non-numeric. Also note that the JSON ordering MUST be the same for each term if
    In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
    Out[39]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    79

  • In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
    Out[39]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    80 . boolean, default
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    _61. Set to enable usage of higher precision (strtod) function when decoding string to double values. Default (
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    61) is to use fast but less precise builtin functionality

  • In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
    Out[39]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    14 . string, the timestamp unit to detect if converting dates. Default None. By default the timestamp precision will be detected, if this is not desired then pass one of ‘s’, ‘ms’, ‘us’ or ‘ns’ to force timestamp precision to seconds, milliseconds, microseconds or nanoseconds respectively

  • In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
    Out[39]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    16 . reads file as one json object per line

  • In [25]: df2 = pd.read_csv(StringIO(data))
    
    In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")
    
    In [27]: df2
    Out[27]: 
       col_1
    0   1.00
    1   2.00
    2    NaN
    3   4.22
    
    In [28]: df2["col_1"].apply(type).value_counts()
    Out[28]: 
    <class 'float'>    4
    Name: col_1, dtype: int64
    
    62 . The encoding to use to decode py3 bytes

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    90 . when used in combination with
    In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
    Out[39]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    87, return a JsonReader which reads in
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    90 lines per iteration

The parser will raise one of

In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
89 if the JSON is not parseable

If a non-default

In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
91 was used when encoding to JSON be sure to pass the same option here so that decoding produces sensible results, see for an overview

Data conversion

The default of

In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
91,
In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
92, and
In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
93 will try to parse the axes, and all of the data into appropriate types, including dates. If you need to override specific dtypes, pass a dict to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88.
In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
70 should only be set to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61 if you need to preserve string-like numbers (e. g. ‘1’, ‘2’) in an axes

Catatan

Large integer values may be converted to dates if

In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
93 and the data and / or column labels appear ‘date-like’. The exact threshold depends on the
In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
14 specified. ‘date-like’ means that the column label meets one of the following criteria

  • it ends with

    In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
    Out[39]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    99

  • it ends with

    In [40]: from pandas.api.types import CategoricalDtype
    
    In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
    
    In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
    Out[42]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    00

  • it begins with

    In [40]: from pandas.api.types import CategoricalDtype
    
    In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
    
    In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
    Out[42]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    01

  • it is

    In [40]: from pandas.api.types import CategoricalDtype
    
    In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
    
    In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
    Out[42]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    02

  • it is

    In [40]: from pandas.api.types import CategoricalDtype
    
    In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
    
    In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
    Out[42]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    03

Peringatan

When reading JSON data, automatic coercing into dtypes has some quirks

  • an index can be reconstructed in a different order from serialization, that is, the returned order is not guaranteed to be the same as before serialization

  • a column that was

    In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))
    
    In [30]: df = pd.DataFrame({"col_1": col_1})
    
    In [31]: df.to_csv("foo.csv")
    
    In [32]: mixed_df = pd.read_csv("foo.csv")
    
    In [33]: mixed_df["col_1"].apply(type).value_counts()
    Out[33]: 
    <class 'int'>    737858
    <class 'str'>    262144
    Name: col_1, dtype: int64
    
    In [34]: mixed_df["col_1"].dtype
    Out[34]: dtype('O')
    
    11 data will be converted to
    In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))
    
    In [30]: df = pd.DataFrame({"col_1": col_1})
    
    In [31]: df.to_csv("foo.csv")
    
    In [32]: mixed_df = pd.read_csv("foo.csv")
    
    In [33]: mixed_df["col_1"].apply(type).value_counts()
    Out[33]: 
    <class 'int'>    737858
    <class 'str'>    262144
    Name: col_1, dtype: int64
    
    In [34]: mixed_df["col_1"].dtype
    Out[34]: dtype('O')
    
    13 if it can be done safely, e. g. a column of
    In [40]: from pandas.api.types import CategoricalDtype
    
    In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
    
    In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
    Out[42]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    06

  • bool columns will be converted to

    In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))
    
    In [30]: df = pd.DataFrame({"col_1": col_1})
    
    In [31]: df.to_csv("foo.csv")
    
    In [32]: mixed_df = pd.read_csv("foo.csv")
    
    In [33]: mixed_df["col_1"].apply(type).value_counts()
    Out[33]: 
    <class 'int'>    737858
    <class 'str'>    262144
    Name: col_1, dtype: int64
    
    In [34]: mixed_df["col_1"].dtype
    Out[34]: dtype('O')
    
    13 on reconstruction

Thus there are times where you may want to specify specific dtypes via the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88 keyword argument

Reading from a JSON string

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
4

Reading from a file

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
5

Don’t convert any data (but still convert axes and dates)

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
6

Specify dtypes for conversion

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
7

Preserve string indices

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
8

Dates written in nanoseconds need to be read back in nanoseconds

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
9

The Numpy parameter

Catatan

This param has been deprecated as of version 1. 0. 0 and will raise a

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
09

This supports numeric data only. Index and columns labels may be non-numeric, e. g. strings, dates etc

If

In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
79 is passed to
In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
11 an attempt will be made to sniff an appropriate dtype during deserialization and to subsequently decode directly to NumPy arrays, bypassing the need for intermediate Python objects

This can provide speedups if you are deserialising a large amount of numeric data

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
00

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
01

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
02

The speedup is less noticeable for smaller datasets

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
03

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
04

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
05

Peringatan

Direct NumPy decoding makes a number of assumptions and may fail or produce unexpected output if these assumptions are not satisfied

  • data is numeric

  • data is uniform. The dtype is sniffed from the first value decoded. A

    In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    27 may be raised, or incorrect output may be produced if this condition is not satisfied

  • labels are ordered. Labels are only read from the first container, it is assumed that each subsequent row / column has been encoded in the same order. Ini harus dipenuhi jika data dikodekan menggunakan

    In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
    
    In [36]: pd.read_csv(StringIO(data))
    Out[36]: 
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    In [37]: pd.read_csv(StringIO(data)).dtypes
    Out[37]: 
    col1    object
    col2    object
    col3     int64
    dtype: object
    
    In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
    Out[38]: 
    col1    category
    col2    category
    col3    category
    dtype: object
    
    88 tetapi mungkin tidak demikian jika JSON berasal dari sumber lain

Normalisasi

pandas provides a utility function to take a dict or list of dicts and normalize this semi-structured data into a flat table

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
06

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
07

The max_level parameter provides more control over which level to end normalization. With max_level=1 the following snippet normalizes until 1st nesting level of the provided dict

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
08

Line delimited json

pandas is able to read and write line-delimited json files that are common in data processing pipelines using Hadoop or Spark

For line-delimited json files, pandas can also return an iterator which reads in

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
90 lines at a time. This can be useful for large files or to read from a stream

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
09

Table schema

Table Schema is a spec for describing tabular datasets as a JSON object. The JSON includes information on the field names, types, and other attributes. You can use the orient

In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
Out[39]: 
col1    category
col2      object
col3       int64
dtype: object
04 to build a JSON string with two fields,
In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
16 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
56

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
10

The

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
16 field contains the
In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
19 key, which itself contains a list of column name to type pairs, including the
In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
20 or
In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))

In [30]: df = pd.DataFrame({"col_1": col_1})

In [31]: df.to_csv("foo.csv")

In [32]: mixed_df = pd.read_csv("foo.csv")

In [33]: mixed_df["col_1"].apply(type).value_counts()
Out[33]: 
<class 'int'>    737858
<class 'str'>    262144
Name: col_1, dtype: int64

In [34]: mixed_df["col_1"].dtype
Out[34]: dtype('O')
76 (see below for a list of types). The
In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
16 field also contains a
In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
23 field if the (Multi)index is unique

The second field,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
56, contains the serialized data with the
In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
95 orient. The index is included, and any datetimes are ISO 8601 formatted, as required by the Table Schema spec

The full list of types supported are described in the Table Schema spec. This table shows the mapping from pandas types

pandas type

Table Schema type

int64

integer

float64

number

bool

boolean

datetime64[ns]

datetime

timedelta64[ns]

duration

categorical

any

object

str

A few notes on the generated table schema

  • The

    In [40]: from pandas.api.types import CategoricalDtype
    
    In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
    
    In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
    Out[42]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    16 object contains a
    In [40]: from pandas.api.types import CategoricalDtype
    
    In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
    
    In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
    Out[42]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    27 field. This contains the version of pandas’ dialect of the schema, and will be incremented with each revision

  • All dates are converted to UTC when serializing. Even timezone naive values, which are treated as UTC with an offset of 0

    In [6]: data = "col1,col2,col3\na,b,1"
    
    In [7]: df = pd.read_csv(StringIO(data))
    
    In [8]: df.columns = [f"pre_{col}" for col in df.columns]
    
    In [9]: df
    Out[9]: 
      pre_col1 pre_col2  pre_col3
    0        a        b         1
    
    11

  • datetimes with a timezone (before serializing), include an additional field

    In [40]: from pandas.api.types import CategoricalDtype
    
    In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
    
    In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
    Out[42]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    28 with the time zone name (e. g.
    In [40]: from pandas.api.types import CategoricalDtype
    
    In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
    
    In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
    Out[42]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    29)

    In [6]: data = "col1,col2,col3\na,b,1"
    
    In [7]: df = pd.read_csv(StringIO(data))
    
    In [8]: df.columns = [f"pre_{col}" for col in df.columns]
    
    In [9]: df
    Out[9]: 
      pre_col1 pre_col2  pre_col3
    0        a        b         1
    
    12

  • Periods are converted to timestamps before serialization, and so have the same behavior of being converted to UTC. Selain itu, periode akan berisi dan bidang tambahan

    In [40]: from pandas.api.types import CategoricalDtype
    
    In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
    
    In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
    Out[42]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    30 dengan frekuensi periode, e. g.
    In [40]: from pandas.api.types import CategoricalDtype
    
    In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
    
    In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
    Out[42]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    31

    In [6]: data = "col1,col2,col3\na,b,1"
    
    In [7]: df = pd.read_csv(StringIO(data))
    
    In [8]: df.columns = [f"pre_{col}" for col in df.columns]
    
    In [9]: df
    Out[9]: 
      pre_col1 pre_col2  pre_col3
    0        a        b         1
    
    13

  • Categoricals use the

    In [40]: from pandas.api.types import CategoricalDtype
    
    In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
    
    In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
    Out[42]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    32 type and an
    In [40]: from pandas.api.types import CategoricalDtype
    
    In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
    
    In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
    Out[42]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    33 constraint listing the set of possible values. Additionally, an
    In [40]: from pandas.api.types import CategoricalDtype
    
    In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
    
    In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
    Out[42]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    34 field is included

    In [6]: data = "col1,col2,col3\na,b,1"
    
    In [7]: df = pd.read_csv(StringIO(data))
    
    In [8]: df.columns = [f"pre_{col}" for col in df.columns]
    
    In [9]: df
    Out[9]: 
      pre_col1 pre_col2  pre_col3
    0        a        b         1
    
    14

  • A

    In [40]: from pandas.api.types import CategoricalDtype
    
    In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
    
    In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
    Out[42]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    23 field, containing an array of labels, is included if the index is unique

    In [6]: data = "col1,col2,col3\na,b,1"
    
    In [7]: df = pd.read_csv(StringIO(data))
    
    In [8]: df.columns = [f"pre_{col}" for col in df.columns]
    
    In [9]: df
    Out[9]: 
      pre_col1 pre_col2  pre_col3
    0        a        b         1
    
    15

  • The

    In [40]: from pandas.api.types import CategoricalDtype
    
    In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
    
    In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
    Out[42]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    23 behavior is the same with MultiIndexes, but in this case the
    In [40]: from pandas.api.types import CategoricalDtype
    
    In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
    
    In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
    Out[42]: 
    col1    category
    col2      object
    col3       int64
    dtype: object
    
    23 is an array

    In [6]: data = "col1,col2,col3\na,b,1"
    
    In [7]: df = pd.read_csv(StringIO(data))
    
    In [8]: df.columns = [f"pre_{col}" for col in df.columns]
    
    In [9]: df
    Out[9]: 
      pre_col1 pre_col2  pre_col3
    0        a        b         1
    
    16

  • The default naming roughly follows these rules

    • For series, the

      In [40]: from pandas.api.types import CategoricalDtype
      
      In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
      
      In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
      Out[42]: 
      col1    category
      col2      object
      col3       int64
      dtype: object
      
      38 is used. If that’s none, then the name is
      In [39]: pd.read_csv(StringIO(data), dtype={"col1": "category"}).dtypes
      Out[39]: 
      col1    category
      col2      object
      col3       int64
      dtype: object
      
      03

    • For

      In [40]: from pandas.api.types import CategoricalDtype
      
      In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
      
      In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
      Out[42]: 
      col1    category
      col2      object
      col3       int64
      dtype: object
      
      40, the stringified version of the column name is used

    • For

      In [40]: from pandas.api.types import CategoricalDtype
      
      In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
      
      In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
      Out[42]: 
      col1    category
      col2      object
      col3       int64
      dtype: object
      
      20 (not
      In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))
      
      In [30]: df = pd.DataFrame({"col_1": col_1})
      
      In [31]: df.to_csv("foo.csv")
      
      In [32]: mixed_df = pd.read_csv("foo.csv")
      
      In [33]: mixed_df["col_1"].apply(type).value_counts()
      Out[33]: 
      <class 'int'>    737858
      <class 'str'>    262144
      Name: col_1, dtype: int64
      
      In [34]: mixed_df["col_1"].dtype
      Out[34]: dtype('O')
      
      76),
      In [40]: from pandas.api.types import CategoricalDtype
      
      In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
      
      In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
      Out[42]: 
      col1    category
      col2      object
      col3       int64
      dtype: object
      
      43 is used, with a fallback to
      In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
      
      In [36]: pd.read_csv(StringIO(data))
      Out[36]: 
        col1 col2  col3
      0    a    b     1
      1    a    b     2
      2    c    d     3
      
      In [37]: pd.read_csv(StringIO(data)).dtypes
      Out[37]: 
      col1    object
      col2    object
      col3     int64
      dtype: object
      
      In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
      Out[38]: 
      col1    category
      col2    category
      col3    category
      dtype: object
      
      42 if that is None

    • For

      In [29]: col_1 = list(range(500000)) + ["a", "b"] + list(range(500000))
      
      In [30]: df = pd.DataFrame({"col_1": col_1})
      
      In [31]: df.to_csv("foo.csv")
      
      In [32]: mixed_df = pd.read_csv("foo.csv")
      
      In [33]: mixed_df["col_1"].apply(type).value_counts()
      Out[33]: 
      <class 'int'>    737858
      <class 'str'>    262144
      Name: col_1, dtype: int64
      
      In [34]: mixed_df["col_1"].dtype
      Out[34]: dtype('O')
      
      76,
      In [40]: from pandas.api.types import CategoricalDtype
      
      In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
      
      In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
      Out[42]: 
      col1    category
      col2      object
      col3       int64
      dtype: object
      
      46 is used. If any level has no name, then
      In [40]: from pandas.api.types import CategoricalDtype
      
      In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)
      
      In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
      Out[42]: 
      col1    category
      col2      object
      col3       int64
      dtype: object
      
      47 is used

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
11 also accepts
In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
49 as an argument. This allows for the preservation of metadata such as dtypes and index names in a round-trippable manner

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
17

Please note that the literal string ‘index’ as the name of an is not round-trippable, nor are any names beginning with

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
51 within a . These are used by default in to indicate missing values and the subsequent read cannot distinguish the intent

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
18

When using

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
49 along with user-defined
In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
55, the generated schema will contain an additional
In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
56 key in the respective
In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
19 element. This extra key is not standard but does enable JSON roundtrips for extension types (e. g.
In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
58)

The

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
56 key carries the name of the extension, if you have properly registered the
In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
60, pandas will use said name to perform a lookup into the registry and re-convert the serialized data into your custom dtype

HTML

Reading HTML content

Peringatan

We highly encourage you to read the below regarding the issues surrounding the BeautifulSoup4/html5lib/lxml parsers

The top-level

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
61 function can accept an HTML string/file/URL and will parse HTML tables into list of pandas
In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
40. Let’s look at a few examples

Catatan

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
63 returns a
In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
64 of
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 objects, even if there is only a single table contained in the HTML content

Read a URL with no options

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
19

Catatan

The data from the above URL changes every Monday so the resulting data above may be slightly different

Read in the content of the file from the above URL and pass it to

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
63 as a string

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
20

You can even pass in an instance of

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
11 if you so desire

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
21

Catatan

The following examples are not run by the IPython evaluator due to the fact that having so many network-accessing functions slows down the documentation build. If you spot an error or an example that doesn’t run, please do not hesitate to report it over on pandas GitHub issues page

Read a URL and match a table that contains specific text

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
22

Specify a header row (by default

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
68 or
In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
69 elements located within a
In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
70 are used to form the column index, if multiple rows are contained within
In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
70 then a MultiIndex is created); if specified, the header row is taken from the data minus the parsed header elements (
In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
68 elements)

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
23

Specify an index column

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
24

Specify a number of rows to skip

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
25

Specify a number of rows to skip using a list (

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
73 works as well)

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
26

Tentukan atribut HTML

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
27

Specify values that should be converted to NaN

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
28

Specify whether to keep the default set of NaN values

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
29

Specify converters for columns. This is useful for numerical text data that has leading zeros. By default columns that are numerical are cast to numeric types and the leading zeros are lost. To avoid this, we can convert these columns to strings

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
_30

Use some combination of the above

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
31

Read in pandas

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
74 output (with some loss of floating point precision)

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
32

Backend

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
_75 akan memunculkan error pada parse yang gagal jika itu adalah satu-satunya parser yang Anda berikan. If you only have a single parser you can provide just a string, but it is considered good practice to pass a list with one string if, for example, the function expects a sequence of strings. You may use

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
33

Or you could pass

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
76 without a list

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
_34

However, if you have bs4 and html5lib installed and pass

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24 or
In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
78 then the parse will most likely succeed. Note that as soon as a parse succeeds, the function will return

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
35

Links can be extracted from cells along with the text using

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
79

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
36

New in version 1. 5. 0

Writing to HTML files

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 objects have an instance method
In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
74 which renders the contents of the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 as an HTML table. The function arguments are as in the method
In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
62 described above

Catatan

Not all of the possible options for

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
84 are shown here for brevity’s sake. See
In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
85 for the full set of options

Catatan

In an HTML-rendering supported environment like a Jupyter Notebook,

In [40]: from pandas.api.types import CategoricalDtype

In [41]: dtype = CategoricalDtype(["d", "c", "b", "a"], ordered=True)

In [42]: pd.read_csv(StringIO(data), dtype={"col1": dtype}).dtypes
Out[42]: 
col1    category
col2      object
col3       int64
dtype: object
86 will render the raw HTML into the environment

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
37

The

In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
40 argument will limit the columns shown

In [6]: data = "col1,col2,col3\na,b,1"

In [7]: df = pd.read_csv(StringIO(data))

In [8]: df.columns = [f"pre_{col}" for col in df.columns]

In [9]: df
Out[9]: 
  pre_col1 pre_col2  pre_col3
0        a        b         1
38

In [35]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In [36]: pd.read_csv(StringIO(data))
Out[36]: 
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

In [37]: pd.read_csv(StringIO(data)).dtypes
Out[37]: 
col1    object
col2    object
col3     int64
dtype: object

In [38]: pd.read_csv(StringIO(data), dtype="category").dtypes
Out[38]: 
col1    category
col2    category
col3    category
dtype: object
39 takes a Python callable to control the precision of floating point values

In [6]: data = "