Answer a question

I have a Pandas DataFrame in which one of the columns contains string elements, and those string elements contain new lines that I would like to print literally. But they just appear as \n in the output.

That is, I want to print this:

  pos     bidder
0   1
1   2
2   3  <- alice
       <- bob
3   4

but this is what I get:

  pos            bidder
0   1
1   2
2   3  <- alice\n<- bob
3   4

How can I accomplish what I want? Can I use a DataFrame, or will I have to revert to manually printing padded columns one row at a time?

Here's what I have so far:

n = 4
output = pd.DataFrame({
    'pos': range(1, n+1),
    'bidder': [''] * n
})
bids = {'alice': 3, 'bob': 3}
used_pos = []
for bidder, pos in bids.items():
    if pos in used_pos:
        arrow = output.ix[pos, 'bidder']
        output.ix[pos, 'bidder'] = arrow + "\n<- %s" % bidder
    else:
        output.ix[pos, 'bidder'] = "<- %s" % bidder
print(output)

Answers

From pandas.DataFrame documention:

Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure

So you can't have a row without an index. Newline "\n" won't work in DataFrame.

You could overwrite 'pos' with an empty value, and output the next 'bidder' on the next row. But then index and 'pos' would be offset every time you do that. Like:

  pos    bidder
0   1          
1   2          
2   3  <- alice
3        <- bob
4   5   

So if a bidder called 'frank' had 4 as value, it would overwrite 'bob'. This would cause problems as you add more. It is probably possible to use DataFrame and write code to work around this issue, but probably worth looking into other solutions.

Here is the code to produce the output structure above.

import pandas as pd

n = 5
output = pd.DataFrame({'pos': range(1, n + 1),
                      'bidder': [''] * n},
                      columns=['pos', 'bidder'])
bids = {'alice': 3, 'bob': 3}
used_pos = []
for bidder, pos in bids.items():
    if pos in used_pos:
        output.ix[pos, 'bidder'] = "<- %s" % bidder
        output.ix[pos, 'pos'] = ''
    else:
        output.ix[pos - 1, 'bidder'] = "<- %s" % bidder
        used_pos.append(pos)
print(output)

Edit:

Another option is to restructure the data and output. You could have pos as columns, and create a new row for each key/person in the data. In the code example below it prints the DataFrame with NaN values replaced with an empty string.

import pandas as pd

data = {'johnny\nnewline': 2, 'alice': 3, 'bob': 3,
        'frank': 4, 'lisa': 1, 'tom': 8}
n = range(1, max(data.values()) + 1)

# Create DataFrame with columns = pos
output = pd.DataFrame(columns=n, index=[])

# Populate DataFrame with rows
for index, (bidder, pos) in enumerate(data.items()):
    output.loc[index, pos] = bidder

# Print the DataFrame and remove NaN to make it easier to read.
print(output.fillna(''))

# Fetch and print every element in column 2
for index in range(1, 5):
    print(output.loc[index, 2])

It depends what you want to do with the data though. Good luck :)

Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐