介绍:

我得到了一个CSV文件,其中字段分隔符是管道分类(即|).

此文件具有预定义数量的字段(例如N).我可以通过读取CSV文件的标题来发现N的值,我们可以认为这是正确的.

问题:

一些字段错误地包含换行符,这使得该行看起来比所需的短(即,它具有M个字段,其中M

if [ $# -ne 1 ]

then

echo "Usage: $0 "

exit

fi

# get first line

first_line=$(head -n 1 $1)

# get number of fields

num_separators=$(echo "$first_line" | tr -d -c '|' | awk '{print length}')

cat $1 | awk -v numFields=$(( num_separators + 1 )) -F '|' '

{

totRecords = NF/numFields

# loop over lines

for (record=0; record < totRecords; record++) {

output = ""

# loop over fields

for (i=0; i

j = (numFields*record)+i+1

# replace newline with question mark

sub("\n", "?", $j)

output = output (i > 0 ? "|" : "") $j

}

print output

}

}

'

但是,换行符仍然存在.

我该如何解决这个问题?

CSV示例:

FIRST_NAME|LAST_NAME|NOTES

John|Smith|This is a field with a

newline

Foo|Bar|Baz

预期产量:

FIRST_NAME|LAST_NAME|NOTES

John|Smith|This is a field with a * newline

Foo|Bar|Baz

* I don't care about the replacement, it could be a space, a question mark, whatever except a newline or a pipe (which would create a new field)

Logo

更多推荐