Welcome to the absolute beginner’s guide to NumPy! If you have comments or suggestions, please don’t hesitate to reach out! Show
Welcome to NumPy!NumPy (Numerical Python) is an open source Python library that’s used in almost every field of science and engineering. It’s the universal standard for working with numerical data in Python, and it’s at the core of the scientific Python and PyData ecosystems. NumPy users include everyone from beginning coders to experienced researchers doing state-of-the-art scientific and industrial research and development. The NumPy API is used extensively in Pandas, SciPy, Matplotlib, scikit-learn, scikit-image and most other data science and scientific Python packages. The NumPy library contains multidimensional array and matrix data structures (you’ll find more information about this in later sections). It provides ndarray, a homogeneous n-dimensional array object, with methods to efficiently operate on it. NumPy can be used to perform a wide variety of mathematical operations on arrays. It adds powerful data structures to Python that guarantee efficient calculations with arrays and matrices and it supplies an enormous library of high-level mathematical functions that operate on these arrays and matrices. Learn more about ! Installing NumPyTo install NumPy, we strongly recommend using a scientific Python distribution. If you’re looking for the full instructions for installing NumPy on your operating system, see Installing NumPy. If you already have Python, you can install NumPy with: conda install numpy or pip install numpy If you don’t have Python yet, you might want to consider using Anaconda. It’s the easiest way to get started. The good thing about getting this distribution is the fact that you don’t need to worry too much about separately installing NumPy or any of the major packages that you’ll be using for your data analyses, like pandas, Scikit-Learn, etc. How to import NumPyTo access NumPy and its functions import it in your Python code like this: import numpy as np We shorten the imported name to pip install numpy34 for better readability of code using NumPy. This is a widely adopted convention that you should follow so that anyone working with your code can easily understand it. Reading the example codeIf you aren’t already comfortable with reading tutorials that contain a lot of code, you might not know how to interpret a code block that looks like this: >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6) If you aren’t familiar with this style, it’s very easy to understand. If you see pip install numpy35, you’re looking at input, or the code that you would enter. Everything that doesn’t have pip install numpy35 in front of it is output, or the results of running your code. This is the style you see when you run pip install numpy37 on the command line, but if you’re using IPython, you might see a different style. Note that it is not part of the code and will cause an error if typed or pasted into the Python shell. It can be safely typed or pasted into the IPython shell; the pip install numpy35 is ignored. What’s the difference between a Python list and a NumPy array?NumPy gives you an enormous range of fast and efficient ways of creating arrays and manipulating numerical data inside them. While a Python list can contain different data types within a single list, all of the elements in a NumPy array should be homogeneous. The mathematical operations that are meant to be performed on arrays would be extremely inefficient if the arrays weren’t homogeneous. Why use NumPy? NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further. What is an array?An array is a central data structure of the NumPy library. An array is a grid of values and it contains information about the raw data, how to locate an element, and how to interpret an element. It has a grid of elements that can be indexed in . The elements are all of the same type, referred to as the array pip install numpy39. An array can be indexed by a tuple of nonnegative integers, by booleans, by another array, or by integers. The pip install numpy40 of the array is the number of dimensions. The pip install numpy41 of the array is a tuple of integers giving the size of the array along each dimension. One way we can initialize NumPy arrays is from Python lists, using nested lists for two- or higher-dimensional data. For example: >>> a = np.array([1, 2, 3, 4, 5, 6]) or: >>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]) We can access the elements in the array using square brackets. When you’re accessing elements, remember that indexing in NumPy starts at 0. That means that if you want to access the first element in your array, you’ll be accessing element “0”. >>> print(a[0]) [1 2 3 4] More information about arraysThis section covers pip install numpy42, pip install numpy43, pip install numpy44, pip install numpy45, pip install numpy46 You might occasionally hear an array referred to as a “ndarray,” which is shorthand for “N-dimensional array.” An N-dimensional array is simply an array with any number of dimensions. You might also hear 1-D, or one-dimensional array, 2-D, or two-dimensional array, and so on. The NumPy pip install numpy44 class is used to represent both matrices and vectors. A vector is an array with a single dimension (there’s no difference between row and column vectors), while a matrix refers to an array with two dimensions. For 3-D or higher dimensional arrays, the term tensor is also commonly used. What are the attributes of an array? An array is usually a fixed-size container of items of the same type and size. The number of dimensions and items in an array is defined by its shape. The shape of an array is a tuple of non-negative integers that specify the sizes of each dimension. In NumPy, dimensions are called axes. This means that if you have a 2D array that looks like this: [[0., 0., 0.], [1., 1., 1.]] Your array has 2 axes. The first axis has a length of 2 and the second axis has a length of 3. Just like in other Python container objects, the contents of an array can be accessed and modified by indexing or slicing the array. Unlike the typical container objects, different arrays can share the same data, so changes made on one array might be visible in another. Array attributes reflect information intrinsic to the array itself. If you need to get, or even set, properties of an array without creating a new array, you can often access an array through its attributes. and learn about . How to create a basic arrayThis section covers pip install numpy48, pip install numpy49, pip install numpy50, pip install numpy51, pip install numpy52, pip install numpy53, pip install numpy39 To create a NumPy array, you can use the function pip install numpy48. All you need to do to create a simple array is pass a list to it. If you choose to, you can also specify the type of data in your list. . >>> import numpy as np >>> a = np.array([1, 2, 3]) You can visualize your array this way: Be aware that these visualizations are meant to simplify ideas and give you a basic understanding of NumPy concepts and mechanics. Arrays and array operations are much more complicated than are captured here! Besides creating an array from a sequence of elements, you can easily create an array filled with pip install numpy56’s: >>> np.zeros(2) array([0., 0.]) Or an array filled with pip install numpy57’s: pip install numpy0 Or even an empty array! The function pip install numpy58 creates an array whose initial content is random and depends on the state of the memory. The reason to use pip install numpy58 over pip install numpy60 (or something similar) is speed - just make sure to fill every element afterwards! pip install numpy1 You can create an array with a range of elements: pip install numpy2 And even an array that contains a range of evenly spaced intervals. To do this, you will specify the first number, last number, and the step size. pip install numpy3 You can also use pip install numpy53 to create an array with values that are spaced linearly in a specified interval: pip install numpy4 Specifying your data type While the default data type is floating point ( pip install numpy62), you can explicitly specify which data type you want using the pip install numpy39 keyword. pip install numpy5 Adding, removing, and sorting elementsThis section covers pip install numpy64, pip install numpy65 Sorting an element is simple with pip install numpy64. You can specify the axis, kind, and order when you call the function. If you start with this array: pip install numpy6 You can quickly sort the numbers in ascending order with: pip install numpy7 In addition to sort, which returns a sorted copy of an array, you can use:
To read more about sorting an array, see: . If you start with these arrays: pip install numpy8 You can concatenate them with pip install numpy65. pip install numpy9 Or, if you start with these arrays: import numpy as np0 You can concatenate them with: import numpy as np1 In order to remove elements from an array, it’s simple to use indexing to select the elements that you want to keep. To read more about concatenate, see: . How do you know the shape and size of an array?This section covers pip install numpy74, pip install numpy75, pip install numpy76 pip install numpy74 will tell you the number of axes, or dimensions, of the array. pip install numpy75 will tell you the total number of elements of the array. This is the product of the elements of the array’s shape. pip install numpy76 will display a tuple of integers that indicate the number of elements stored along each dimension of the array. If, for example, you have a 2-D array with 2 rows and 3 columns, the shape of your array is pip install numpy80. For example, if you create this array: import numpy as np2 To find the number of dimensions of the array, run: import numpy as np3 To find the total number of elements in the array, run: import numpy as np4 And to find the shape of your array, run: import numpy as np5 Can you reshape an array?This section covers pip install numpy81 Yes! Using pip install numpy81 will give a new shape to an array without changing the data. Just remember that when you use the reshape method, the array you want to produce needs to have the same number of elements as the original array. If you start with an array with 12 elements, you’ll need to make sure that your new array also has a total of 12 elements. If you start with this array: import numpy as np6 You can use pip install numpy83 to reshape your array. For example, you can reshape this array to an array with three rows and two columns: import numpy as np7 With pip install numpy84, you can specify a few optional parameters: import numpy as np8 pip install numpy85 is the array to be reshaped. pip install numpy86 is the new shape you want. You can specify an integer or a tuple of integers. If you specify an integer, the result will be an array of that length. The shape should be compatible with the original shape. pip install numpy87 pip install numpy88 means to read/write the elements using C-like index order, pip install numpy89 means to read/write the elements using Fortran-like index order, pip install numpy90 means to read/write the elements in Fortran-like index order if a is Fortran contiguous in memory, C-like order otherwise. (This is an optional parameter and doesn’t need to be specified.) If you want to learn more about C and Fortran order, you can . Essentially, C and Fortran orders have to do with how indices correspond to the order the array is stored in memory. In Fortran, when moving through the elements of a two-dimensional array as it is stored in memory, the first index is the most rapidly varying index. As the first index moves to the next row as it changes, the matrix is stored one column at a time. This is why Fortran is thought of as a Column-major language. In C on the other hand, the last index changes the most rapidly. The matrix is stored by rows, making it a Row-major language. What you do for C or Fortran depends on whether it’s more important to preserve the indexing convention or not reorder the data. . How to convert a 1D array into a 2D array (how to add a new axis to an array)This section covers pip install numpy91, pip install numpy92 You can use pip install numpy91 and pip install numpy92 to increase the dimensions of your existing array. Using pip install numpy91 will increase the dimensions of your array by one dimension when used once. This means that a 1D array will become a 2D array, a 2D array will become a 3D array, and so on. For example, if you start with this array: import numpy as np9 You can use pip install numpy91 to add a new axis: >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)0 You can explicitly convert a 1D array with either a row vector or a column vector using pip install numpy91. For example, you can convert a 1D array to a row vector by inserting an axis along the first dimension: >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)1 Or, for a column vector, you can insert an axis along the second dimension: >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)2 You can also expand an array by inserting a new axis at a specified position with pip install numpy92. For example, if you start with this array: import numpy as np9 You can use pip install numpy92 to add an axis at index position 1 with: >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)4 You can add an axis at index position 0 with: >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)5 Find more information about and import numpy as np00 at . Indexing and slicingYou can index and slice NumPy arrays in the same ways you can slice Python lists. >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)6 You can visualize it this way: You may want to take a section of your array or specific array elements to use in further analysis or additional operations. To do that, you’ll need to subset, slice, and/or index your arrays. If you want to select values from your array that fulfill certain conditions, it’s straightforward with NumPy. For example, if you start with this array: >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)7 You can easily print all of the values in the array that are less than 5. >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)8 You can also select, for example, numbers that are equal to or greater than 5, and use that condition to index an array. >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)9 You can select elements that are divisible by 2: >>> a = np.array([1, 2, 3, 4, 5, 6])0 Or you can select elements that satisfy two conditions using the import numpy as np02 and import numpy as np03 operators: >>> a = np.array([1, 2, 3, 4, 5, 6])1 You can also make use of the logical operators & and | in order to return boolean values that specify whether or not the values in an array fulfill a certain condition. This can be useful with arrays that contain names or other categorical values. >>> a = np.array([1, 2, 3, 4, 5, 6])2 You can also use import numpy as np04 to select elements or indices from an array. Starting with this array: >>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]) You can use import numpy as np04 to print the indices of elements that are, for example, less than 5: >>> a = np.array([1, 2, 3, 4, 5, 6])4 In this example, a tuple of arrays was returned: one for each dimension. The first array represents the row indices where these values are found, and the second array represents the column indices where the values are found. If you want to generate a list of coordinates where the elements exist, you can zip the arrays, iterate over the list of coordinates, and print them. For example: >>> a = np.array([1, 2, 3, 4, 5, 6])5 You can also use import numpy as np04 to print the elements in an array that are less than 5 with: >>> a = np.array([1, 2, 3, 4, 5, 6])6 If the element you’re looking for doesn’t exist in the array, then the returned array of indices will be empty. For example: >>> a = np.array([1, 2, 3, 4, 5, 6])7 Learn more about and . Read more about using the nonzero function at: . How to create an array from existing dataThis section covers import numpy as np08, import numpy as np09, import numpy as np10, import numpy as np11, import numpy as np12, import numpy as np13 You can easily create a new array from a section of an existing array. Let’s say you have this array: >>> a = np.array([1, 2, 3, 4, 5, 6])8 You can create a new array from a section of your array any time by specifying where you want to slice your array. >>> a = np.array([1, 2, 3, 4, 5, 6])9 Here, you grabbed a section of your array from index position 3 through index position 8. You can also stack two existing arrays, both vertically and horizontally. Let’s say you have two arrays, import numpy as np14 and import numpy as np15: >>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])0 You can stack them vertically with import numpy as np16: >>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])1 Or stack them horizontally with import numpy as np17: >>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])2 You can split an array into several smaller arrays using import numpy as np18. You can specify either the number of equally shaped arrays to return or the columns after which the division should occur. Let’s say you have this array: >>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])3 If you wanted to split this array into three equally shaped arrays, you would run: >>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])4 If you wanted to split your array after the third and fourth column, you’d run: >>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])5 . You can use the import numpy as np19 method to create a new array object that looks at the same data as the original array (a shallow copy). Views are an important NumPy concept! NumPy functions, as well as operations like indexing and slicing, will return views whenever possible. This saves memory and is faster (no copy of the data has to be made). However it’s important to be aware of this - modifying data in a view also modifies the original array! Let’s say you create this array: >>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]) Now we create an array import numpy as np20 by slicing pip install numpy85 and modify the first element of import numpy as np20. This will modify the corresponding element in pip install numpy85 as well! >>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])7 Using the import numpy as np24 method will make a complete copy of the array and its data (a deep copy). To use this on your array, you could run: >>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])8 . Basic array operationsThis section covers addition, subtraction, multiplication, division, and more Once you’ve created your arrays, you can start to work with them. Let’s say, for example, that you’ve created two arrays, one called “data” and one called “ones” You can add the arrays together with the plus sign. >>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])9 You can, of course, do more than just addition! >>> print(a[0]) [1 2 3 4]0 Basic operations are simple with NumPy. If you want to find the sum of the elements in an array, you’d use import numpy as np25. This works for 1D arrays, 2D arrays, and arrays in higher dimensions. >>> print(a[0]) [1 2 3 4]1 To add the rows or the columns in a 2D array, you would specify the axis. If you start with this array: >>> print(a[0]) [1 2 3 4]2 You can sum over the axis of rows with: >>> print(a[0]) [1 2 3 4]3 You can sum over the axis of columns with: >>> print(a[0]) [1 2 3 4]4 . BroadcastingThere are times when you might want to carry out an operation between an array and a single number (also called an operation between a vector and a scalar) or between arrays of two different sizes. For example, your array (we’ll call it “data”) might contain information about distance in miles but you want to convert the information to kilometers. You can perform this operation with: >>> print(a[0]) [1 2 3 4]5 NumPy understands that the multiplication should happen with each cell. That concept is called broadcasting. Broadcasting is a mechanism that allows NumPy to perform operations on arrays of different shapes. The dimensions of your array must be compatible, for example, when the dimensions of both arrays are equal or when one of them is 1. If the dimensions are not compatible, you will get a import numpy as np26. . More useful array operationsThis section covers maximum, minimum, sum, mean, product, standard deviation, and more NumPy also performs aggregation functions. In addition to import numpy as np27, import numpy as np28, and import numpy as np29, you can easily run import numpy as np30 to get the average, import numpy as np31 to get the result of multiplying the elements together, import numpy as np32 to get the standard deviation, and more. >>> print(a[0]) [1 2 3 4]6 Let’s start with this array, called “a” >>> print(a[0]) [1 2 3 4]7 It’s very common to want to aggregate along a row or column. By default, every NumPy aggregation function will return the aggregate of the entire array. To find the sum or the minimum of the elements in your array, run: >>> print(a[0]) [1 2 3 4]8 Or: >>> print(a[0]) [1 2 3 4]9 You can specify on which axis you want the aggregation function to be computed. For example, you can find the minimum value within each column by specifying import numpy as np33. [[0., 0., 0.], [1., 1., 1.]]0 The four values listed above correspond to the number of columns in your array. With a four-column array, you will get four values as your result. Read more about . Creating matricesYou can pass Python lists of lists to create a 2-D array (or “matrix”) to represent them in NumPy. [[0., 0., 0.], [1., 1., 1.]]1 Indexing and slicing operations are useful when you’re manipulating matrices: [[0., 0., 0.], [1., 1., 1.]]2 You can aggregate matrices the same way you aggregated vectors: [[0., 0., 0.], [1., 1., 1.]]3 You can aggregate all the values in a matrix and you can aggregate them across columns or rows using the import numpy as np34 parameter. To illustrate this point, let’s look at a slightly modified dataset: [[0., 0., 0.], [1., 1., 1.]]4 Once you’ve created your matrices, you can add and multiply them using arithmetic operators if you have two matrices that are the same size. [[0., 0., 0.], [1., 1., 1.]]5 You can do these arithmetic operations on matrices of different sizes, but only if one matrix has only one column or one row. In this case, NumPy will use its broadcast rules for the operation. [[0., 0., 0.], [1., 1., 1.]]6 Be aware that when NumPy prints N-dimensional arrays, the last axis is looped over the fastest while the first axis is the slowest. For instance: [[0., 0., 0.], [1., 1., 1.]]7 There are often instances where we want NumPy to initialize the values of an array. NumPy offers functions like import numpy as np35 and import numpy as np36, and the import numpy as np37 class for random number generation for that. All you need to do is pass in the number of elements you want it to generate: [[0., 0., 0.], [1., 1., 1.]]8 You can also use import numpy as np35, import numpy as np36, and import numpy as np40 to create a 2D array if you give them a tuple describing the dimensions of the matrix: [[0., 0., 0.], [1., 1., 1.]]9 Read more about creating arrays, filled with pip install numpy56’s, pip install numpy57’s, other values or uninitialized, at . Generating random numbersThe use of random number generation is an important part of the configuration and evaluation of many numerical and machine learning algorithms. Whether you need to randomly initialize weights in an artificial neural network, split data into random sets, or randomly shuffle your dataset, being able to generate random numbers (actually, repeatable pseudo-random numbers) is essential. With import numpy as np43, you can generate random integers from low (remember that this is inclusive with NumPy) to high (exclusive). You can set import numpy as np44 to make the high number inclusive. You can generate a 2 x 4 array of random integers between 0 and 4 with: >>> import numpy as np >>> a = np.array([1, 2, 3])0 . How to get unique items and countsThis section covers import numpy as np45 You can find the unique elements in an array easily with import numpy as np46. For example, if you start with this array: >>> import numpy as np >>> a = np.array([1, 2, 3])1 you can use import numpy as np46 to print the unique values in your array: >>> import numpy as np >>> a = np.array([1, 2, 3])2 To get the indices of unique values in a NumPy array (an array of first index positions of unique values in the array), just pass the import numpy as np48 argument in import numpy as np45 as well as your array. >>> import numpy as np >>> a = np.array([1, 2, 3])3 You can pass the import numpy as np50 argument in import numpy as np45 along with your array to get the frequency count of unique values in a NumPy array. >>> import numpy as np >>> a = np.array([1, 2, 3])4 This also works with 2D arrays! If you start with this array: >>> import numpy as np >>> a = np.array([1, 2, 3])5 You can find unique values with: >>> import numpy as np >>> a = np.array([1, 2, 3])6 If the axis argument isn’t passed, your 2D array will be flattened. If you want to get the unique rows or columns, make sure to pass the import numpy as np34 argument. To find the unique rows, specify import numpy as np33 and for columns, specify import numpy as np54. >>> import numpy as np >>> a = np.array([1, 2, 3])7 To get the unique rows, index position, and occurrence count, you can use: >>> import numpy as np >>> a = np.array([1, 2, 3])8 To learn more about finding the unique elements in an array, see . Transposing and reshaping a matrixThis section covers pip install numpy81, import numpy as np57, import numpy as np58 It’s common to need to transpose your matrices. NumPy arrays have the property import numpy as np59 that allows you to transpose a matrix. You may also need to switch the dimensions of a matrix. This can happen when, for example, you have a model that expects a certain input shape that is different from your dataset. This is where the import numpy as np60 method can be useful. You simply need to pass in the new dimensions that you want for the matrix. >>> import numpy as np >>> a = np.array([1, 2, 3])9 You can also use import numpy as np61 to reverse or change the axes of an array according to the values you specify. If you start with this array: >>> np.zeros(2) array([0., 0.])0 You can transpose your array with import numpy as np57. >>> np.zeros(2) array([0., 0.])1 You can also use import numpy as np58: >>> np.zeros(2) array([0., 0.])2 To learn more about transposing and reshaping arrays, see and . How to reverse an arrayThis section covers import numpy as np66 NumPy’s import numpy as np66 function allows you to flip, or reverse, the contents of an array along an axis. When using import numpy as np66, specify the array you would like to reverse and the axis. If you don’t specify the axis, NumPy will reverse the contents along all of the axes of your input array. Reversing a 1D array If you begin with a 1D array like this one: >>> np.zeros(2) array([0., 0.])3 You can reverse it with: >>> np.zeros(2) array([0., 0.])4 If you want to print your reversed array, you can run: >>> np.zeros(2) array([0., 0.])5 Reversing a 2D array A 2D array works much the same way. If you start with this array: >>> np.zeros(2) array([0., 0.])6 You can reverse the content in all of the rows and all of the columns with: >>> np.zeros(2) array([0., 0.])7 You can easily reverse only the rows with: >>> np.zeros(2) array([0., 0.])8 Or reverse only the columns with: >>> np.zeros(2) array([0., 0.])9 You can also reverse the contents of only one column or row. For example, you can reverse the contents of the row at index position 1 (the second row): pip install numpy00 You can also reverse the column at index position 1 (the second column): pip install numpy01 Read more about reversing arrays at . Reshaping and flattening multidimensional arraysThis section covers import numpy as np70, import numpy as np71 There are two popular ways to flatten an array: import numpy as np70 and import numpy as np73. The primary difference between the two is that the new array created using import numpy as np71 is actually a reference to the parent array (i.e., a “view”). This means that any changes to the new array will affect the parent array as well. Since import numpy as np75 does not create a copy, it’s memory efficient. If you start with this array: pip install numpy02 You can use import numpy as np76 to flatten your array into a 1D array. pip install numpy03 When you use import numpy as np76, changes to your new array won’t change the parent array. For example: pip install numpy04 But when you use import numpy as np75, the changes you make to the new array will affect the parent array. For example: pip install numpy05 Read more about import numpy as np76 at and import numpy as np75 at . How to access the docstring for more informationThis section covers import numpy as np83, import numpy as np84, import numpy as np85 When it comes to the data science ecosystem, Python and NumPy are built with the user in mind. One of the best examples of this is the built-in access to documentation. Every object contains the reference to a string, which is known as the docstring. In most cases, this docstring contains a quick and concise summary of the object and how to use it. Python has a built-in import numpy as np83 function that can help you access this information. This means that nearly any time you need more information, you can use import numpy as np83 to quickly find the information that you need. For example: pip install numpy06 Because access to additional information is so useful, IPython uses the import numpy as np84 character as a shorthand for accessing this documentation along with other relevant information. IPython is a command shell for interactive computing in multiple languages. You can find more information about IPython here. For example: pip install numpy07 You can even use this notation for object methods and objects themselves. Let’s say you create this array: >>> a = np.array([1, 2, 3, 4, 5, 6]) Then you can obtain a lot of useful information (first details about pip install numpy85 itself, followed by the docstring of pip install numpy44 of which pip install numpy85 is an instance): pip install numpy09 This also works for functions and other objects that you create. Just remember to include a docstring with your function using a string literal ( import numpy as np92 or import numpy as np93 around your documentation). For example, if you create this function: pip install numpy10 You can obtain information about the function: pip install numpy11 You can reach another level of information by reading the source code of the object you’re interested in. Using a double question mark ( import numpy as np85) allows you to access the source code. For example: pip install numpy12 If the object in question is compiled in a language other than Python, using import numpy as np85 will return the same information as import numpy as np84. You’ll find this with a lot of built-in objects and types, for example: pip install numpy13 and : pip install numpy14 have the same output because they were compiled in a programming language other than Python. Working with mathematical formulasThe ease of implementing mathematical formulas that work on arrays is one of the things that make NumPy so widely used in the scientific Python community. For example, this is the mean square error formula (a central formula used in supervised machine learning models that deal with regression): Implementing this formula is simple and straightforward in NumPy: What makes this work so well is that import numpy as np97 and import numpy as np98 can contain one or a thousand values. They only need to be the same size. You can visualize it this way: In this example, both the predictions and labels vectors contain three values, meaning import numpy as np99 has a value of three. After we carry out subtractions the values in the vector are squared. Then NumPy sums the values, and your result is the error value for that prediction and a score for the quality of the model. How to save and load NumPy objectsThis section covers >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)00, >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)01, >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)02, >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)03, >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)04 You will, at some point, want to save your arrays to disk and load them back without having to re-run the code. Fortunately, there are several ways to save and load objects with NumPy. The ndarray objects can be saved to and loaded from the disk files with >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)05 and >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)06 functions that handle normal text files, >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)07 and >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)08 functions that handle NumPy binary files with a .npy file extension, and a >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)09 function that handles NumPy files with a .npz file extension. The .npy and .npz files store data, shape, dtype, and other information required to reconstruct the ndarray in a way that allows the array to be correctly retrieved, even when the file is on another machine with different architecture. If you want to store a single ndarray object, store it as a .npy file using >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)00. If you want to store more than one ndarray object in a single file, save it as a .npz file using >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)01. You can also save several arrays into a single file in compressed npz format with . It’s easy to save and load and array with >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)13. Just make sure to specify the array you want to save and a file name. For example, if you create this array: >>> a = np.array([1, 2, 3, 4, 5, 6]) You can save it as “filename.npy” with: pip install numpy16 You can use >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)14 to reconstruct your array. pip install numpy17 If you want to check your array, you can run: pip install numpy18 You can save a NumPy array as a plain text file like a .csv or .txt file with >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)02. For example, if you create this array: pip install numpy19 You can easily save it as a .csv file with the name “new_file.csv” like this: pip install numpy20 You can quickly and easily load your saved text file using >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)16: pip install numpy21 The >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)17 and >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)16 functions accept additional optional parameters such as header, footer, and delimiter. While text files can be easier for sharing, .npy and .npz files are smaller and faster to read. If you need more sophisticated handling of your text file (for example, if you need to work with lines that contain missing values), you will want to use the function. With , you can specify headers, footers, comments, and more. Learn more about . Importing and exporting a CSVIt’s simple to read in a CSV that contains existing information. The best and easiest way to do this is to use Pandas. pip install numpy22 It’s simple to use Pandas in order to export your array as well. If you are new to NumPy, you may want to create a Pandas dataframe from the values in your array and then write the data frame to a CSV file with Pandas. If you created this array “a” pip install numpy23 You could create a Pandas dataframe pip install numpy24 You can easily save your dataframe with: pip install numpy25 And read your CSV with: pip install numpy26 You can also save your array with the NumPy >>> a = np.arange(6) >>> a2 = a[np.newaxis, :] >>> a2.shape (1, 6)06 method. pip install numpy27 If you’re using the command line, you can read your saved CSV any time with a command such as: pip install numpy28 Or you can open the file any time with a text editor! If you’re interested in learning more about Pandas, take a look at the official Pandas documentation. Learn how to install Pandas with the official Pandas installation information. Plotting arrays with MatplotlibIf you need to generate a plot for your values, it’s very simple with Matplotlib. For example, you may have an array like this one: pip install numpy29 If you already have Matplotlib installed, you can import it with: pip install numpy30 All you need to do to plot your values is run: pip install numpy31 For example, you can plot a 1D array like this: pip install numpy32 With Matplotlib, you have access to an enormous number of visualization options. pip install numpy33 To read more about Matplotlib and what it can do, take a look at the official documentation. For directions regarding installing Matplotlib, see the official installation section. |