I am trying to write a program which will count words in a large file. I am doing multi threading. But my program gives segmentation fault and I am just stuck here. I am looking for any advice from mentors: The code is given below:
INPUT: file name
Output: Segmentation Fault
The code is:
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
struct thread_data{
FILE *fp;
long int offset;
int start;
int blockSize;
};
int words=0;
void *countFrequency(void* data){
struct thread_data* td=data;
char *buffer = malloc(td->blockSize);
int i,c;
i=0;c=0;
enum states { WHITESPACE, WORD };
int state = WHITESPACE;
fseek(td->fp, td->offset, td->start);
char last = ' ';
while ((fread(buffer, td->blockSize, 1, td->fp))==1){
if ( buffer[0]== ' ' || buffer[0] == '\t' ){
state = WHITESPACE;
}
else if (buffer[0]=='\n'){
//newLine++;
state = WHITESPACE;
}
else {
if ( state == WHITESPACE ){
words++;
}
state = WORD;
}
last = buffer[0];
}
free(buffer);
pthread_exit(NULL);
return NULL;
}
int main(int argc, char **argv){
int nthreads, x, id, blockSize,len;
//void *state;
FILE *fp;
pthread_t *threads;
struct thread_data data[nthreads];
if (argc < 2){
fprintf(stderr, "Usage: ./a.out <file_path>");
exit(-1);
}
if((fp=fopen(argv[1],"r"))==NULL){
printf("Error opening file");
exit(-1);
}
printf("Enter the number of threads: ");
scanf("%d",&nthreads);
threads = malloc(nthreads*sizeof(pthread_t));
fseek(fp, 0, SEEK_END);
len = ftell(fp);
printf("len= %d\n",len);
blockSize=(len+nthreads-1)/nthreads;
printf("size= %d\n",blockSize);
for(id = 0; id < nthreads; id++){
data[id].fp=fp;
data[id].offset = blockSize;
data[id].start = id*blockSize+1;
}
//LAST THREAD
data[nthreads-1].start=(nthreads-1)*blockSize+1;
for(id = 0; id < nthreads; id++)
pthread_create(&threads[id], NULL, &countFrequency,&data[id]);
for(id = 0; id < nthreads; id++)
pthread_join(threads[id],NULL);
fclose(fp);
//free(threads);
//pthread_exit(NULL);
printf("%d\n",words);
return 0;
}
Typecasting does not fix wrong code - it only disguises it or makes it even more wrong. Let's look at those errors:
You can't cast a
struct thread_data *
to astruct thread_data
, neither can you assign astruct thread_data
to astruct thread_data *
. The incorrect and unnecessary cast is the sole cause of the error.Secondly, nor can you cast a
struct thread_data
to avoid *
- you need an actual pointer, like the address ofdata
:No cast, either, because pointers to data types convert to
void *
naturally. Of course, since there's only one copy ofdata
all the threads are going to share it, and all work on whatever the last values written to it were. That's not going to go well - you'll want onestruct thread_data
per thread.Thirdly, those warnings are telling you your thread function has the wrong signature:
Combined with the first point, get all the types correct and yet again no casts are needed.