Informatics Practices
Consider the following tables Item and Customer and answer the questions that follow:
Table: Item
| Item_ID | ItemName | Manufacturer | Price |
|---|---|---|---|
| PC01 | Personal Computer | HCL India | 42000 |
| LCO5 | Laptop | HP USA | 55000 |
| PCO3 | Personal Computer | Dell USA | 32000 |
| PC06 | Personal Computer | Zenith USA | 37000 |
| LCO3 | Laptop | Dell USA | 57000 |
Table: Customer
| Item_ID | CustomerName | City |
|---|---|---|
| LCO3 | N Roy | Delhi |
| PCO3 | H Singh | Mumbai |
| PC06 | R Pandey | Delhi |
| LCO3 | C Sharma | Chennai |
| PC01 | K Agarwal | Bengaluru |
Assume that the Pandas has been imported as pd.
(a) Create a dataframe called dfI for table Item.
(b) Create a dataframe called dfC for table Customer.
(c) Perform the default join operation on item_ID using two dataframes: dfI and dfC.
(d) Perform the left join operation on item_ID using two dataframes: dfI and dfC.
(e) Perform the right join operation on Item_ID using two dataframes: dfI and dfC.
(f) Perform the default operation on Item_ID using two dataframes: dfI and dfC with the left index as true.
(g) Perform the outer join operation on item_ID using two dataframes: dfI and dfC.
(h) Create a new dataframe dfN using dataframes: dfI and dfC. The new dataframe data will hold both left index and right index true values.
(i) Arrange the dataframe dfN in descending order of Price.
(j) Arrange the dataframe dfN in descending order of City and Price.
Answer
(a)
dfI = pd.DataFrame({
'Item_ID': ['PC01', 'LCO5', 'PCO3', 'PC06', 'LCO3'],
'ItemName': ['Personal Computer', 'Laptop', 'Personal Computer', 'Personal Computer', 'Laptop'],
'Manufacturer': ['HCL India', 'HP USA', 'Dell USA', 'Zenith USA', 'Dell USA'],
'Price': [42000, 55000, 32000, 37000, 57000]
})
Output
Item_ID ItemName Manufacturer Price
0 PC01 Personal Computer HCL India 42000
1 LCO5 Laptop HP USA 55000
2 PCO3 Personal Computer Dell USA 32000
3 PC06 Personal Computer Zenith USA 37000
4 LCO3 Laptop Dell USA 57000
(b)
dfC = pd.DataFrame({
'Item_ID': ['LCO3', 'PCO3', 'PC06', 'LCO3', 'PC01'],
'CustomerName': ['N Roy', 'H Singh', 'R Pandey', 'C Sharma', 'K Agarwal'],
'City': ['Delhi', 'Mumbai', 'Delhi', 'Chennai', 'Bengaluru']
})
Output
Item_ID CustomerName City
0 LCO3 N Roy Delhi
1 PCO3 H Singh Mumbai
2 PC06 R Pandey Delhi
3 LCO3 C Sharma Chennai
4 PC01 K Agarwal Bengaluru
(c)
df_default_join = pd.merge(dfI, dfC, on = 'Item_ID')
Output
Item_ID ItemName Manufacturer Price CustomerName City
0 PC01 Personal Computer HCL India 42000 K Agarwal Bengaluru
1 PCO3 Personal Computer Dell USA 32000 H Singh Mumbai
2 PC06 Personal Computer Zenith USA 37000 R Pandey Delhi
3 LCO3 Laptop Dell USA 57000 N Roy Delhi
4 LCO3 Laptop Dell USA 57000 C Sharma Chennai
(d)
df_left_join = pd.merge(dfI, dfC, left_on='Item_ID', right_on='Item_ID', how='left')
Output
Item_ID ItemName Manufacturer Price CustomerName City
0 PC01 Personal Computer HCL India 42000 K Agarwal Bengaluru
1 LCO5 Laptop HP USA 55000 NaN NaN
2 PCO3 Personal Computer Dell USA 32000 H Singh Mumbai
3 PC06 Personal Computer Zenith USA 37000 R Pandey Delhi
4 LCO3 Laptop Dell USA 57000 N Roy Delhi
5 LCO3 Laptop Dell USA 57000 C Sharma Chennai
(e)
df_right_join = pd.merge(dfI, dfC, left_on='Item_ID', right_on='Item_ID', how='right')
Output
Item_ID ItemName Manufacturer Price CustomerName City
0 LCO3 Laptop Dell USA 57000 N Roy Delhi
1 PCO3 Personal Computer Dell USA 32000 H Singh Mumbai
2 PC06 Personal Computer Zenith USA 37000 R Pandey Delhi
3 LCO3 Laptop Dell USA 57000 C Sharma Chennai
4 PC01 Personal Computer HCL India 42000 K Agarwal Bengaluru
(f) This operation is not possible because the left_index parameter is used for merging on the index of the dataframes, not on a specific column. The default merge operation is an inner join, and it requires a common column to merge on.
(g)
df_outer_join = pd.merge(dfI, dfC, on='Item_ID', how='outer')
Output
Item_ID ItemName Manufacturer Price CustomerName City
0 LCO3 Laptop Dell USA 57000 N Roy Delhi
1 LCO3 Laptop Dell USA 57000 C Sharma Chennai
2 LCO5 Laptop HP USA 55000 NaN NaN
3 PC01 Personal Computer HCL India 42000 K Agarwal Bengaluru
4 PC06 Personal Computer Zenith USA 37000 R Pandey Delhi
5 PCO3 Personal Computer Dell USA 32000 H Singh Mumbai
(h)
dfN = pd.merge(dfI, dfC, left_index=True, right_index=True)
Output
Item_ID_x ItemName Manufacturer Price Item_ID_y CustomerName City
0 PC01 Personal Computer HCL India 42000 LCO3 N Roy Delhi
1 LCO5 Laptop HP USA 55000 PCO3 H Singh Mumbai
2 PCO3 Personal Computer Dell USA 32000 PC06 R Pandey Delhi
3 PC06 Personal Computer Zenith USA 37000 LCO3 C Sharma Chennai
4 LCO3 Laptop Dell USA 57000 PC01 K Agarwal Bengaluru
(i)
dfN_sorted_price = dfN.sort_values(by='Price', ascending=False)
Output
Item_ID_x ItemName Manufacturer Price Item_ID_y CustomerName City
4 LCO3 Laptop Dell USA 57000 PC01 K Agarwal Bengaluru
1 LCO5 Laptop HP USA 55000 PCO3 H Singh Mumbai
0 PC01 Personal Computer HCL India 42000 LCO3 N Roy Delhi
3 PC06 Personal Computer Zenith USA 37000 LCO3 C Sharma Chennai
2 PCO3 Personal Computer Dell USA 32000 PC06 R Pandey Delhi
(j)
dfN_sorted_city_price = dfN.sort_values(by=['City', 'Price'], ascending=[False, False])
Output
Item_ID_x ItemName Manufacturer Price Item_ID_y CustomerName City
1 LCO5 Laptop HP USA 55000 PCO3 H Singh Mumbai
0 PC01 Personal Computer HCL India 42000 LCO3 N Roy Delhi
2 PCO3 Personal Computer Dell USA 32000 PC06 R Pandey Delhi
3 PC06 Personal Computer Zenith USA 37000 LCO3 C Sharma Chennai
4 LCO3 Laptop Dell USA 57000 PC01 K Agarwal Bengaluru
Related Questions
Write commands to print following details of a Series object seal :
(a) if the series is empty
(b) indexes of the series
(c) The data type of underlying data
(d) if the series stores any NaN values
How do you fill all missing values with previous non-missing values?
Consider the following series object namely S:
0 0.430271 1 0.617328 2 0.265421 3 0.836113 dtype: float64What will be returned by the following statements?
(a) S * 100
(b) S > 0
(c) S1 = pd.Series(S)
(d) S3 = pd.Series(S1) + 3
What will be the output produced by the following code?
Stationery = ['pencils', 'notebooks', 'scales', 'erasers'] S = pd.Series([20, 33, 52, 10], index = Stationery) S2 = pd.Series([17, 13, 31, 32], index = Stationery) print(S == S2) S = S + S2 print(S)