Sorting out redundant text from screen scraper application

3

You are writing a screen scraper application monitoring a text-only chat window. Text is added at the bottom of the window.

The application takes screenshot of the chat window. If a change has occurred since last screenshot (new_screenshot != old_screenshot), the screenshot is saved.

After X time, all images are merged to one image, where the oldest image is on the top. This large image is send to a server for OCR, and a string of text is returned.

Challenge: Sort out redundant text.

Example: Chat window is 5 lines high and is initially empty. The solution must work with empty and not-empty initial chat window. More than one line can be added at each screenshot.

Input to algorithm:

1 Lorem ipsum dolor sit amet,
1 Lorem ipsum dolor sit amet,
2 consectetur adipiscing elit
1 Lorem ipsum dolor sit amet,
2 consectetur adipiscing elit
3 Mauris porttitor enim sed tincidunt interdum.
1 Lorem ipsum dolor sit amet,
2 consectetur adipiscing elit
3 Mauris porttitor enim sed tincidunt interdum.
4 Morbi elementum erat nec nulla auctor, eget porta odio aliquet.
1 Lorem ipsum dolor sit amet,
2 consectetur adipiscing elit
3 Mauris porttitor enim sed tincidunt interdum.
4 Morbi elementum erat nec nulla auctor, eget porta odio aliquet.
5 Nam aliquet velit vel elementum tristique.
2 consectetur adipiscing elit
3 Mauris porttitor enim sed tincidunt interdum.
4 Morbi elementum erat nec nulla auctor, eget porta odio aliquet.
5 Nam aliquet velit vel elementum tristique.
6 Donec ac tincidunt urna.
3 Mauris porttitor enim sed tincidunt interdum.
4 Morbi elementum erat nec nulla auctor, eget porta odio aliquet.
5 Nam aliquet velit vel elementum tristique.
6 Donec ac tincidunt urna.
7 Proin pretium, metus non porttitor lobortis, tortor sem rhoncus urna
4 Morbi elementum erat nec nulla auctor, eget porta odio aliquet.
5 Nam aliquet velit vel elementum tristique.
6 Donec ac tincidunt urna.
7 Proin pretium, metus non porttitor lobortis, tortor sem rhoncus urna
8 quis finibus leo lorem sed lacus.
5 Nam aliquet velit vel elementum tristique.
6 Donec ac tincidunt urna.
7 Proin pretium, metus non porttitor lobortis, tortor sem rhoncus urna
8 quis finibus leo lorem sed lacus.
1 Lorem ipsum dolor sit amet,

Expected result:

1 Lorem ipsum dolor sit amet,
2 consectetur adipiscing elit
3 Mauris porttitor enim sed tincidunt interdum.
4 Morbi elementum erat nec nulla auctor, eget porta odio aliquet.
5 Nam aliquet velit vel elementum tristique.
6 Donec ac tincidunt urna.
7 Proin pretium, metus non porttitor lobortis, tortor sem rhoncus urna
8 quis finibus leo lorem sed lacus.
1 Lorem ipsum dolor sit amet,

UPDATE: I forgot an important detail in the original challenge: The same line can come multiple times, but never two times in a row so just deduplicating is not enough.

Vingtoft

Posted 2018-08-30T16:36:04.973

Reputation: 131

Question was closed 2018-08-30T20:18:17.970

Welcome to PPCG! This site is a language-agnostic challenge site, it's not like StackOverflow. I've tried to edit your question to stay on topic. If you are looking for a solution to a problem you cannot solve, this may not be the site for you. – Stephen – 2018-08-30T16:53:06.803

Hi Vingtoft, and welcome to PPCG! As it stands, this question would likely be a better fit for StackOverflow. That said, if you add a win condition (usually [tag:code-golf]), this could make for a good challenge. In the future, we also recommend running questions through the sandbox before posting here.

– None – 2018-08-30T16:55:46.123

Hi, sorry for the violation of community guidelines. Feel free to delete my challenge. – Vingtoft – 2018-08-30T16:57:04.453

1@Vingtoft No worries. I hope we'll see you again for future challenges. Incidentally, what you're looking for is probably sorted(set(text.split('\n'))). – None – 2018-08-30T17:02:13.803

Em, the challenge looks ok to me... – Luis felipe De jesus Munoz – 2018-08-30T17:13:56.740

Am I missing something or is this just split on newlines and deduplicate? – Shaggy – 2018-08-30T17:56:42.297

@Shaggy: I have updated my question. One line can appear multiple times (but never two times in a row), so just deduplicating would not be enough. Sorry for the inadequate initial description. – Vingtoft – 2018-08-30T18:05:14.320

What is the input? The height and the merged screenshot? – Erik the Outgolfer – 2018-08-30T18:40:42.957

Also, in the example, the first and second lines are identical. What do you mean by saying that the same line can't come twice in a row? – Erik the Outgolfer – 2018-08-30T18:43:24.147

The same line will never come twice in a row in "expected result", but it might occur in input. Yes, window height is known and can be used as input. – Vingtoft – 2018-08-30T18:49:11.700

so... the input in your example input comprises of 7 blocks (screens) of 5 lines of text? and you are logging the additions from one screen to the other? and the line 1 is repeated because it appears in [5,6,7,8,1] but not in [4,5,6,7,8] (previous screen) ? – JayCe – 2018-08-30T19:36:57.527

1so the output is the result of seven comparisons which in sequence add: [1,2] [3,4] [5] [6] [7] [8] [1 ?] – JayCe – 2018-08-30T19:39:55.863

2Hi and welcome to PPCG. Generally it is frowned upon to change the question after answers have been given, especially if it invalidates existing answers. I just voted to close due to the question now being unclear. I'll delete my own answer for now, until I can understand exactly what is being asked. – Digital Trauma – 2018-08-30T19:42:01.187

@DigitalTrauma I understand, apologies, this is my first question on this site. – Vingtoft – 2018-08-30T20:15:05.227

>

  • you should post challenges in the sandbox before posting on main site. 2. all challenges must have an objective winning criteria (see [help/on-topic]) . 3. you should absolutely avoid writing posts on this site like help request -- it's a challenge, not a question, not a task, not an assignment. 4. see When is EDIT/UPDATE appropriate in a post? 5. See formatting help, you should format data as code.
  • – user202729 – 2018-08-31T06:33:50.767

    @Stephen On editing somebody else's off-topic question to make it on-topic without the owner's word - Programming Puzzles & Code Golf Meta Stack Exchange Instead, tell OP how to edit the post in the comment section.

    – user202729 – 2018-08-31T06:34:54.160

    Answers

    0

    Japt -R, 11 bytes

    This passes the test case but the spec still isn't clear enough

    ·ò¨ mÌò¦ mg
    

    Try it

    Shaggy

    Posted 2018-08-30T16:36:04.973

    Reputation: 24 623