I guess there's no way to not read the whole file in the memory, at least I don't know any.
$csv = gci "C:\location" -filter *.csv | % {
(Get-Content $_.FullName | select -skip 3) | Set-Content $_.FullName
Add-Content -path $_.FullName -value ""
}
This would be a PowerShell solution which requires to load the whole file into memory.
- search every csv from a location with
gci
,
- loop over the found csv files with
foreach
alias %
,
- get their whole content (can take some time) with
get-content
,
- select everything but skip the first 3 lines
select -skip
- and set that content to the file with
set-content
.
- the last line will add a new line to the file
add-content
Edit: You can try to make this whole thing faster by adding the -ReadCount
Parameter to your get-content
call.
-ReadCount (int)
Specifies how many lines of content are sent through the pipeline at a
time. The default value is 1. A value of 0 (zero) sends all of the
content at one time.
This parameter does not change the content displayed, but it does
affect the time it takes to display the content. As the value of
ReadCount increases, the time it takes to return the first line
increases, but the total time for the operation decreases. This can
make a perceptible difference in very large items.
Edit2: I tested get-content
with readcount
. sadly i couldn't find a text file larger than 89mb. but the difference is already significant:
PS C:\Windows\System32> Measure-Command { gc "C:\Pub.log" -readcount 0 }
Days : 0
Hours : 0
Minutes : 0
Seconds : 1
Milliseconds : 22
Ticks : 10224578
TotalDays : 1.18340023148148E-05
TotalHours : 0.000284016055555556
TotalMinutes : 0.0170409633333333
TotalSeconds : 1.0224578
TotalMilliseconds : 1022.4578
PS C:\Windows\System32> Measure-Command { gc "C:\Pub.log" -readcount 1 }
Days : 0
Hours : 0
Minutes : 0
Seconds : 10
Milliseconds : 594
Ticks : 105949457
TotalDays : 0.000122626686342593
TotalHours : 0.00294304047222222
TotalMinutes : 0.176582428333333
TotalSeconds : 10.5949457
TotalMilliseconds : 10594.9457
so get-content $_.FullName -readcount 0
is the way to go
1Not an answer to the question asked, but if you ever get to refactor: This is what a database does quite well. – Hennes – 2016-12-01T14:55:39.723
@Hennes: That would work if the first 3 lines were actual data lines, but they are random text. Edited my question to make this clearer. I formulated it badly earlier... – Wouter – 2016-12-01T15:00:30.827
Ah. I see plenty sulution which include readint he whole file (searched on "trim beginning of a file"). It will be interesting what comes up for windows and which does not read the unchanged parts. – Hennes – 2016-12-01T15:03:56.267
Editing by loading into RAM would of course be trivial :) – Wouter – 2016-12-01T15:08:02.467
Have you tried
– Ramhound – 2016-12-01T16:32:12.943Get-Content
along withSet-Content
to get the first 3 lines and/or BaseStream and read/replace 3 lines? Unless you read in the entire file neither suggestion would result in the entire file being read into memory.Put GNU/Linux in a virtual machine and run it there – Neil McGuigan – 2016-12-01T20:13:42.140
1You might be able to hex edit the 3 lines (or script a routine) into a proper row format with the proper number of field and record delimiters, and then use
FIRSTROW
. This would only require seeking a few bytes into the file. – Yorik – 2016-12-01T21:17:28.293