Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Sign in
Toggle navigation
O
OpenMP_and_CUDA_Homework
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
abdullh.alsoleman
OpenMP_and_CUDA_Homework
Commits
e22344f7
Commit
e22344f7
authored
Jan 31, 2024
by
abdullh.alsoleman
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
The last
parent
092e15e4
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
62 additions
and
0 deletions
+62
-0
CUDA.c
Qestion2_CUDA/CUDA.c
+62
-0
No files found.
Qestion2_CUDA/CUDA.c
0 → 100644
View file @
e22344f7
#include <stdio.h>
#define N 10000
__global__
void
vector_add
(
float
*
out
,
float
*
a
,
float
*
b
,
int
n
)
{
int
tid
=
blockIdx
.
x
*
blockDim
.
x
+
threadIdx
.
x
;
if
(
tid
<
n
)
{
out
[
tid
]
=
a
[
tid
]
+
b
[
tid
];
}
}
int
main
()
{
float
*
a
,
*
b
,
*
out
;
float
*
d_a
,
*
d_b
,
*
d_out
;
// Allocate host memory
a
=
(
float
*
)
malloc
(
sizeof
(
float
)
*
N
);
b
=
(
float
*
)
malloc
(
sizeof
(
float
)
*
N
);
out
=
(
float
*
)
malloc
(
sizeof
(
float
)
*
N
);
// Initialize host arrays
for
(
int
i
=
0
;
i
<
N
;
i
++
){
a
[
i
]
=
i
+
1
;
b
[
i
]
=
26
;
}
// Allocate device memory
cudaMalloc
((
void
**
)
&
d_a
,
sizeof
(
float
)
*
N
);
cudaMalloc
((
void
**
)
&
d_b
,
sizeof
(
float
)
*
N
);
cudaMalloc
((
void
**
)
&
d_out
,
sizeof
(
float
)
*
N
);
// Transfer data from host to device memory
cudaMemcpy
(
d_a
,
a
,
sizeof
(
float
)
*
N
,
cudaMemcpyHostToDevice
);
cudaMemcpy
(
d_b
,
b
,
sizeof
(
float
)
*
N
,
cudaMemcpyHostToDevice
);
// Adjust the block and grid dimensions for better parallelization
int
block_size
=
1024
// You can experiment with different block sizes
int
grid_size
=
(
N
+
block_size
-
1
)
/
block_size
;
// Executing kernel with multiple blocks
vector_add
<<<
grid_size
,
block_size
>>>
(
d_out
,
d_a
,
d_b
,
N
);
// Transfer data back to host memory
cudaMemcpy
(
out
,
d_out
,
sizeof
(
float
)
*
N
,
cudaMemcpyDeviceToHost
);
// Verification
// for(int i = 0; i < N; i++){
// printf("%f\n", out[i]);
// }
// Deallocate device memory
cudaFree
(
d_a
);
cudaFree
(
d_b
);
cudaFree
(
d_out
);
// Deallocate host memory
free
(
a
);
free
(
b
);
free
(
out
);
return
0
;
}
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment