Compress a sparse matrix

18

Compress a sparse matrix using Compressed sparse row (CSR, CRS or Yale format).

These are all the same form of compression (ignore new Yale).

Input may be any 2d data structure (list of lists, etc): e.g

[[0 0 0 0],
 [5 8 0 0],
 [0 0 3 0],
 [0 6 0 0]]

And the output should be three 1d data structures (list etc), that denote the outputs A, IA and JA, for example

[5, 8, 3, 6]
[0, 0, 2, 3, 4]
[0, 1, 2, 1,]

The process is described by wikipedia:

  • The array A is of length NNZ and holds all the nonzero entries of M in left-to-right top-to-bottom ("row-major") order.

  • The array IA is of length m + 1. It is defined by this recursive definition:

    • IA[0] = 0 IA[i] = IA[i − 1] + (number of nonzero elements on the (i − 1)-th row in the original matrix)

    • Thus, the first m elements of IA store the index into A of the first nonzero element in each row of M, and the last element IA[m] stores NNZ, the number of elements in A, which can be also thought of as the index in A of first element of a phantom row just beyond the end of the matrix M. The values of the i-th row of the original matrix is read from the elements A[IA[i]] to A[IA[i + 1] − 1] (inclusive on both ends), i.e. from the start of one row to the last index just before the start of the next.[5]

    • The third array, JA, contains the column index in M of each element of A and hence is of length NNZ as well.

If your language doesn't support actual data structures, input and output may be text.

Test cases

Input 1:

[[0 0 0 0],
 [5 8 0 0],
 [0 0 3 0],
 [0 6 0 0]]

Output 1:

[ 5, 8, 3, 6 ]
[ 0, 0, 2, 3, 4 ]
[ 0, 1, 2, 1, ]

Input 2

[[10 20 0 0 0 0],
 [0 30 0 40 0 0],
 [0 0 50 60 70 0],
 [0 0 0 0 0 80]]

Output 2:

[ 10 20 30 40 50 60 70 80 ]
[  0  2  4  7  8 ]
[  0  1  1  3  2  3  4  5 ]

Input 3:

[[0 0 0],
 [0 0 0],
 [0 0 0]]

Output 3:

[ ]
[ 0 0 0 0 ]
[ ]

Input 4:

[[1 1 1],
 [1 1 1],
 [1 1 1]]

Output 4:

[ 1 1 1 1 1 1 1 1 1 ]
[ 0 3 6 9 ]
[ 0 1 2 0 1 2 0 1 2 ]

Input 5:

[[0 0 0 0],
 [5 -9 0 0],
 [0 0 0.3 0],
 [0 -400 0 0]]

Output 5:

[ 5, -9, 0.3, -400 ]
[ 0, 0, 2, 3, 4 ]
[ 0, 1, 2, 1, ]

Assume inputs may contain any real number, you need not consider mathematical symbols or exponential representation (e.g. 5,000 will never be entered as 5e3). You will not need to handle inf, -inf, NaN or any other 'pseudo-numbers'. You may output a different representation of the number (5,000 may be output as 5e3 if you so choose).

Scoring:

This is a , fewest bytes wins.

Leaderboards

Here is a Stack Snippet to generate both a regular leaderboard and an overview of winners by language.

To make sure that your answer shows up, please start your answer with a headline, using the following Markdown template:

# Language Name, N bytes

where N is the size of your submission. If you improve your score, you can keep old scores in the headline, by striking them through. For instance:

# Ruby, <s>104</s> <s>101</s> 96 bytes

If there you want to include multiple numbers in your header (e.g. because your score is the sum of two files or you want to list interpreter flag penalties separately), make sure that the actual score is the last number in the header:

# Perl, 43 + 2 (-p flag) = 45 bytes

You can also make the language name a link which will then show up in the leaderboard snippet:

# [><>](http://esolangs.org/wiki/Fish), 121 bytes

var QUESTION_ID=129924,OVERRIDE_USER=8478;function answersUrl(e){return"https://api.stackexchange.com/2.2/questions/"+QUESTION_ID+"/answers?page="+e+"&pagesize=100&order=desc&sort=creation&site=codegolf&filter="+ANSWER_FILTER}function commentUrl(e,s){return"https://api.stackexchange.com/2.2/answers/"+s.join(";")+"/comments?page="+e+"&pagesize=100&order=desc&sort=creation&site=codegolf&filter="+COMMENT_FILTER}function getAnswers(){jQuery.ajax({url:answersUrl(answer_page++),method:"get",dataType:"jsonp",crossDomain:!0,success:function(e){answers.push.apply(answers,e.items),answers_hash=[],answer_ids=[],e.items.forEach(function(e){e.comments=[];var s=+e.share_link.match(/\d+/);answer_ids.push(s),answers_hash[s]=e}),e.has_more||(more_answers=!1),comment_page=1,getComments()}})}function getComments(){jQuery.ajax({url:commentUrl(comment_page++,answer_ids),method:"get",dataType:"jsonp",crossDomain:!0,success:function(e){e.items.forEach(function(e){e.owner.user_id===OVERRIDE_USER&&answers_hash[e.post_id].comments.push(e)}),e.has_more?getComments():more_answers?getAnswers():process()}})}function getAuthorName(e){return e.owner.display_name}function process(){var e=[];answers.forEach(function(s){var r=s.body;s.comments.forEach(function(e){OVERRIDE_REG.test(e.body)&&(r="<h1>"+e.body.replace(OVERRIDE_REG,"")+"</h1>")});var a=r.match(SCORE_REG);a&&e.push({user:getAuthorName(s),size:+a[2],language:a[1],link:s.share_link})}),e.sort(function(e,s){var r=e.size,a=s.size;return r-a});var s={},r=1,a=null,n=1;e.forEach(function(e){e.size!=a&&(n=r),a=e.size,++r;var t=jQuery("#answer-template").html();t=t.replace("{{PLACE}}",n+".").replace("{{NAME}}",e.user).replace("{{LANGUAGE}}",e.language).replace("{{SIZE}}",e.size).replace("{{LINK}}",e.link),t=jQuery(t),jQuery("#answers").append(t);var o=e.language;/<a/.test(o)&&(o=jQuery(o).text()),s[o]=s[o]||{lang:e.language,user:e.user,size:e.size,link:e.link}});var t=[];for(var o in s)s.hasOwnProperty(o)&&t.push(s[o]);t.sort(function(e,s){return e.lang>s.lang?1:e.lang<s.lang?-1:0});for(var c=0;c<t.length;++c){var i=jQuery("#language-template").html(),o=t[c];i=i.replace("{{LANGUAGE}}",o.lang).replace("{{NAME}}",o.user).replace("{{SIZE}}",o.size).replace("{{LINK}}",o.link),i=jQuery(i),jQuery("#languages").append(i)}}var ANSWER_FILTER="!t)IWYnsLAZle2tQ3KqrVveCRJfxcRLe",COMMENT_FILTER="!)Q2B_A2kjfAiU78X(md6BoYk",answers=[],answers_hash,answer_ids,answer_page=1,more_answers=!0,comment_page;getAnswers();var SCORE_REG=/<h\d>\s*([^\n,]*[^\s,]),.*?(\d+)(?=[^\n\d<>]*(?:<(?:s>[^\n<>]*<\/s>|[^\n<>]+>)[^\n\d<>]*)*<\/h\d>)/,OVERRIDE_REG=/^Override\s*header:\s*/i;
body{text-align:left!important}#answer-list,#language-list{padding:10px;width:290px;float:left}table thead{font-weight:700}table td{padding:5px}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script> <link rel="stylesheet" type="text/css" href="//cdn.sstatic.net/codegolf/all.css?v=83c949450c8b"> <div id="answer-list"> <h2>Leaderboard</h2> <table class="answer-list"> <thead> <tr><td></td><td>Author</td><td>Language</td><td>Size</td></tr></thead> <tbody id="answers"> </tbody> </table> </div><div id="language-list"> <h2>Winners by Language</h2> <table class="language-list"> <thead> <tr><td>Language</td><td>User</td><td>Score</td></tr></thead> <tbody id="languages"> </tbody> </table> </div><table style="display: none"> <tbody id="answer-template"> <tr><td>{{PLACE}}</td><td>{{NAME}}</td><td>{{LANGUAGE}}</td><td>{{SIZE}}</td><td><a href="{{LINK}}">Link</a></td></tr></tbody> </table> <table style="display: none"> <tbody id="language-template"> <tr><td>{{LANGUAGE}}</td><td>{{NAME}}</td><td>{{SIZE}}</td><td><a href="{{LINK}}">Link</a></td></tr></tbody> </table>

Pureferret

Posted 2017-07-05T11:31:44.427

Reputation: 960

Could 1-based indices be used for the last row? – Leo – 2017-07-05T16:30:20.493

@Leo for JA? No. – Pureferret – 2017-07-05T16:50:56.307

1Isn't IA[0] = 0 completely unnecessary? It's only needed to define IA[i] = IA[i − 1]..., yet we could simply state that if i-1 < 0 to use 0. That is, IA[0] is always equal to 0, therefor it can be compressed out (yes, I realize that this is a critique of the algorithm, not this challenge). – Draco18s no longer trusts SE – 2017-07-05T21:01:00.450

Will we have the inverse challenge too? – Adám – 2017-07-05T21:42:31.573

@Draco18s I beleive that's what distinguishes Yale and new Yale formats. – Pureferret – 2017-07-06T07:51:09.503

@Adám I would like to think so – Pureferret – 2017-07-06T07:51:22.290

1Neat! Hadn't run into either format before, but I'm glad to see someone else did see that before (I shouldn't be the kind of person who spots trivial optimizations in algorithms this old). – Draco18s no longer trusts SE – 2017-07-06T13:03:27.217

Answers

6

MATL, 19 bytes

!3#f!Dx0Gg!XsYshDq!

Input uses ; as row separator.

Try it online! Or verify all test cases: 1, 2, 3, 4, 5.

Explanation

!     % Implicit input. Transpose
3#f   % 3-output version of find: it takes all nonzero values and pushes
      % their column indices, row indices, and values, as column vectors
!     % Transpose into a row vector
D     % Display (and pop) vector of values
x     % Delete vector of row values
0     % Push 0
G     % Push input
g     % Convert to logical: nonzeros become 1
!     % Transpose
Xs    % Sum of columns. Gives a row vector
Ys    % Cumulative sum
h     % Prepend the 0 that's below on the stack
D     % Display (and pop) that vector
q     % Subtract 1 from the vector of row indices
!     % Transpose into a row vector. Implicitly display

Luis Mendo

Posted 2017-07-05T11:31:44.427

Reputation: 87 464

5

Mathematica, 78 bytes

{a=SparseArray@#;a@"NonzeroValues",a@"RowPointers",Join@@a@"ColumnIndices"-1}&

See this answer on mathematica.stackexchange.com.

alephalpha

Posted 2017-07-05T11:31:44.427

Reputation: 23 988

3

Haskell, 87 bytes

f s|a<-filter(/=0)<$>s=(id=<<a,scanl(+)0$length<$>a,s>>= \t->[i|(i,e)<-zip[0..]t,e/=0])

Try it online!

How it works:

a<-filter(/=0)<$>s           -- let a be the list of lists with all 0 removed]
                             -- e.g. [[1,0,0],[0,3,4]] -> [[1],[3,4]]

                             -- return a triple of

id=<<a                       -- a concatenated into a single list -> A 

scanl(+)0$length<$>a         -- partial sums of the length of the sublists of a
                             -- strating with an additional 0 -> IA

s>>=                         -- map the lambda over the sublists of s and concatenate
                             -- into a single list
   \t->[i|(i,e)<-zip[0..]t,e/=0]  -- the indices of the non-zero elements -> JA

nimi

Posted 2017-07-05T11:31:44.427

Reputation: 34 639

2

APL (Dyalog), 31 28 chars or 36 33 bytes*

Requires ⎕IO←0 for zero based indexing. I/O is list of lists.

{(∊d)(0,+\≢¨d←⍵~¨0)(∊⍸¨⍵≠0)}

Try it online!

{} anonymous function where the argument is represented by

()()() return a list of three things:

  ⍵≠0 Boolean where the argument differs from 0
  ⍸¨ɩndices of those for each sub-list
  ϵnlist (flatten) to combine into single list

  ⍵~¨0 remove zeros from each sub-list of the argument
  d← store that as d
  ≢¨ tally each
  +\ cumulative sum
  0, prepend a zero

  ∊dϵnlist (flatten) d to combine into single list

  


* To run in Dyalog Classic, simply replace with ⎕U2378.

Adám

Posted 2017-07-05T11:31:44.427

Reputation: 37 779

Nice, I don't understand the input format though? f 4 4⍴ and then the values? – Pureferret – 2017-07-05T12:08:05.813

@Pureferret the Code defines the function f. The Input is really a REPL, which calls f on the result of 4 4⍴… which reshapes the data into a 4×4 matrix. – Adám – 2017-07-05T12:09:43.887

1Rho for reshapes. I get it! – Pureferret – 2017-07-05T12:12:24.390

1@Pureferret I've updated the Try it online! link to better show test cases. – Adám – 2017-07-05T12:21:23.777

2

Jelly, 24 bytes

n0S€0;;\S€,T€Ẏ$’$
Ẏḟ0W;Ç

Try it online!

Erik the Outgolfer

Posted 2017-07-05T11:31:44.427

Reputation: 38 134

2

PHP, 107 bytes

<?for($y=[$c=0];$r=$_GET[+$l++];)foreach($r as$k=>$v)!$v?:[$x[]=$v,$z[]=$k,$y[$l]=++$c];var_dump($x,$y,$z);

Try it online!

PHP, 109 bytes

<?$y=[$c=0];foreach($_GET as$r){foreach($r as$k=>$v)if($v){$x[]=$v;$z[]=$k;$c++;}$y[]=$c;}var_dump($x,$y,$z);

Try it online!

Jörg Hülsermann

Posted 2017-07-05T11:31:44.427

Reputation: 13 026

Does this need the numbers to be strings? – Pureferret – 2017-07-05T14:06:00.183

1@Pureferret Any Input in PHP is a string or an array of strings. I have not casted the input so if you wish that the output is purely int replace $x[]=$v with $x[]=+$v – Jörg Hülsermann – 2017-07-05T14:21:41.123

2

JavaScript (ES6), 117 bytes

a=>[a.map((b,i)=>(b=b.filter((x,c)=>x&&o.push(c)),m[i+1]=m[i]+b.length,b),m=[0],o=[]).reduce((x,y)=>x.concat(y)),m,o]

Input is a 2D array of numbers and output is an array of [A, IA, JA].

Explained

a=>[
    a.map((b,i) => (                                // map each matrix row
            b = b.filter((x,c) => x                 // filter to only non-zero elements
                && o.push(c)                        // and add this index to JA
            )
            m[i+1] = m[i] + b.length,               // set next value of IA
            b                                       // and return filtered row
        ),
        m=[0],o=[]                          // initialize IA (m) and JA (o)
    ).reduce((x,y) => x.concat(y)),                 // flatten the non-zero matrix
m,o]                                                // append IA and JA

Tests

let f=
a=>[a.map((b,i)=>(b=b.filter((x,c)=>x&&o.push(c)),m[i+1]=m[i]+b.length,b),m=[0],o=[]).reduce((x,y)=>x.concat(y)),m,o]

let run=x=>O.innerHTML+=f(x).map(s=>`[${s.join`, `}]`).join`\n`+"\n\n"
run([[0,0,0,0],[5,8,0,0],[0,0,3,0],[0,6,0,0]])
run([[10,20,0,0,0,0],[0,30,0,40,0,0],[0,0,50,60,70,0],[0,0,0,0,0,80]])
run([[0,0,0],[0,0,0],[0,0,0]])
run([[1,1,1],[1,1,1],[1,1,1]])
run([[0,0,0,0],[5,-9,0,0],[0,0,0.3,0],[0,-400,0,0]])
<pre id=O></pre>

Justin Mariner

Posted 2017-07-05T11:31:44.427

Reputation: 4 746

1

Japt, 31 27 bytes

Takes input as an array of arrays and returns an array of arrays.

[Uc f U®£X©NpYÃZèÃå+ iT NÅ]

Test it (-Q flag for visualisation purposes only)


Explanation

Implicit input of array U.
[[1,1,1],[1,1,1],[1,1,1]]

Uc f

For the first sub=-array, we flatten (c) U and then filter (f) it, removing any falsey elements (i.e., 0s)
[1,1,1,1,1,1,1,1,1]

U®         Ã

We're going to build the other 2 sub-arrays at the same time, by mapping over U.

£     Ã

We map over each element (sub-array) in U

X is the current element of the current sub-array and © is logical AND (&&) so, if X isn't truthy (not zero) the next part won't be executed.

NpY

In Japt, N is an array containing all the inputs so here, if X is truthy, we push (p) the index (Y) of the current element to N.
[[[1,1,1],[1,1,1],[1,1,1]],0,1,2,0,1,2,0,1,2]

Back to the map of the main array and, for each element (Z), we get the count of elements in that sub-array that are truthy (not zero).
[3,3,3]

å+

Cumulatively reduce this array by summing.
[3,6,9]

iT

Insert (i) 0 at index 0 to complete the second sub-array.
[0,3,6,9]

For the final sub-array, we simply slice N from the 1st element.
[0,1,2,0,1,2,0,1,2]

Shaggy

Posted 2017-07-05T11:31:44.427

Reputation: 24 623

I just ran the other examples though and it works – Pureferret – 2017-07-05T12:08:27.683

1

Python 2, 115 bytes

lambda m:zip(*[[v,i]for k in m for i,v in enumerate(k)if v])+[reduce(lambda a,b:a+[len(b)-b.count(0)+a[-1]],m,[0])]

Try it online!

Output is [A, JA, IA]

ovs

Posted 2017-07-05T11:31:44.427

Reputation: 21 408

1

Perl 6, 84 bytes

{.flatmap(*.grep(+*)),(0,|[\+] .map(+*.grep(+*))),.flat.kv.flatmap:{$^a%.[0]xx?$^b}}

Try it online!

The single matrix argument is in $_.

  • .flatmap(*.grep(+*)) selects the nonzero elements of the entire matrix.
  • [\+] .map(+*.grep(+*)) is the triangular reduction of the number of elements in each row (which some languages call scan). (0,|...) prepends a zero to that list.
  • .flat.kv produces an indexed list of all elements of the matrix. .flatmap: { $^a % .[0] xx ?$^b } flat-maps over the modulus of each index by the number of columns in the array (.[0], the number of elements in the first row), replicated by the element itself, interpreted as a boolean. That is, nonzero elements are replicated once, and zero elements are replicated zero times (ie, removed).

Sean

Posted 2017-07-05T11:31:44.427

Reputation: 4 136

1

Python+SciPy, 79 bytes

i guess built-ins were not forbidden

from scipy.sparse import*
A=csr_matrix(input())
print A.data,A.indptr,A.indices

Accepts input in the format [[0, 0, 0, 0],[5, 8, 0, 0],[0, 0, 3, 0],[0, 6, 0, 0]]

Karl Napf

Posted 2017-07-05T11:31:44.427

Reputation: 4 131